3.2. Training and Refining Models
💡 First Principle: Training is an iterative optimization process—imagine hiking through fog trying to find the lowest valley. You're searching a vast space of possible model configurations for the one that best fits your data without memorizing it. What happens when the learning rate is too high? You overshoot the valley entirely. Too low? You get stuck on a plateau for hours. Unlike traditional software where code either works or breaks, training failures are subtle — the model "runs" but produces garbage. Consider a scenario in production where validation loss starts climbing after epoch 50: do you know why, and what to do?
Without understanding the training process, you become a button-presser—tweaking parameters without understanding their effects. Imagine trying to tune a radio without understanding that frequency determines the station. You'd spin the dial randomly, occasionally landing on something clear. The exam expects you to tune deliberately: understand what each parameter controls, predict the effect of changing it, and diagnose why a training run produced bad results.
Think of training as hiking down a mountain in fog. Each step (epoch) moves you lower (reduces loss), but you can only see a few feet ahead (gradients). Step too large (high learning rate) and you overshoot the valley. Step too small (low learning rate) and you'll never reach the bottom before nightfall (compute budget). The batch size determines how much terrain you survey before deciding your next step direction.
⚠️ Common Misconception: "More epochs always means a better model." In reality, training too long causes overfitting — the model memorizes training data instead of learning generalizable patterns. The exam tests this by presenting scenarios where training loss decreases but validation loss increases. The correct response is early stopping or regularization, not more compute.