3.2.2. Hyperparameter Tuning with SageMaker AMT
💡 First Principle: Hyperparameters are settings you choose before training begins—unlike model parameters (weights) that the training process learns. Because the best hyperparameters depend on your specific data and model, you must search for them systematically. Manual tuning is slow and biased by human intuition; automated tuning explores the space more efficiently.
SageMaker Automatic Model Tuning (AMT) runs multiple training jobs with different hyperparameter combinations and selects the best based on an objective metric you define. It supports three search strategies:
| Strategy | How It Works | Best For | Exam Signal |
|---|---|---|---|
| Random search | Samples randomly from parameter ranges | Initial exploration, many parameters | "Quick exploration," "unknown parameter space" |
| Bayesian optimization | Uses past results to intelligently choose next | Efficient search, expensive training jobs | "Minimize training cost," "intelligent search," "fewer jobs" |
| Grid search | Tests every combination in a grid | Small parameter space, exhaustive search | "Exhaustive," "small number of hyperparameters" |
| Hyperband | Early stopping of poor-performing jobs | Large search with limited budget | "Resource-efficient," "early stopping" |
AMT Configuration Essentials:
- Objective metric: The metric AMT optimizes (e.g.,
validation:accuracy,validation:rmse) - Parameter ranges: Continuous, integer, or categorical ranges for each hyperparameter
- Max jobs: Total number of training jobs to run
- Max parallel jobs: How many jobs run simultaneously (more parallel = faster but less Bayesian learning)
Key insight: More parallel jobs speed up the search but reduce the effectiveness of Bayesian optimization, because each job can't learn from the others' results. If budget allows, use fewer parallel jobs with Bayesian optimization for better results. If time is critical, use more parallel jobs with random search.
⚠️ Exam Trap: AMT tuning jobs can be expensive—each "job" is a full training run. When a question asks about "cost-effective hyperparameter tuning," look for answers that use Bayesian optimization (fewer total jobs needed) combined with early stopping (terminate poor-performing jobs quickly). Random search with 100 max jobs and 10 parallel jobs is almost never the most cost-effective answer.
Reflection Question: A team wants to tune 5 hyperparameters for an XGBoost model. Each training job takes 45 minutes and costs $8. They have a $500 budget. Which tuning strategy maximizes the quality of the final model within this budget?