Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.2.3. Regularization, Overfitting, and Underfitting

💡 First Principle: Overfitting is the model memorizing the training data instead of learning generalizable patterns. Underfitting is the model being too simple to capture the patterns. Regularization techniques deliberately reduce the model's capacity to memorize, forcing it to learn patterns that generalize. The exam tests your ability to diagnose which problem is occurring and apply the correct fix.

TechniqueHow It WorksPreventsExam Signal
L1 (Lasso)Adds absolute value of weights to lossOverfitting; drives unimportant features to zero (feature selection)"Feature selection," "sparse model"
L2 (Ridge/Weight Decay)Adds squared weights to lossOverfitting; shrinks all weights (doesn't zero them)"Weight decay," "shrink weights," "prevent large weights"
DropoutRandomly zeros neurons during trainingOverfitting in neural networks"Neural network overfitting," "random deactivation"
Early stoppingStop training when validation metric stops improvingOverfitting from too many epochs"Validation loss increasing," "stop training early"
Data augmentationCreate synthetic training variationsOverfitting from small datasets"Small dataset," "limited training data"
Feature selectionRemove uninformative featuresOverfitting from noisy features"Too many features," "irrelevant features"

Catastrophic forgetting is a distinct failure mode when fine-tuning pre-trained models. The model "forgets" its general knowledge as it adapts to new domain-specific data. Mitigation strategies include using a very low learning rate during fine-tuning, freezing early layers, and progressive unfreezing.

⚠️ Exam Trap: "The model performs well on training data but poorly on test data" is the textbook definition of overfitting—but the exam may describe this indirectly. Watch for phrases like "high training accuracy, low validation accuracy," "model works in development but not production," or "performance degrades on new data." All of these point to overfitting, and the fix involves regularization—not more training.

Reflection Question: A neural network achieves 99% training accuracy but only 72% validation accuracy. The dataset has 5,000 samples. Recommend three specific changes, ordered by which you'd try first.

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications