4.3.4. Automated Testing and Retraining Mechanisms
💡 First Principle: ML pipelines need tests at every stage—not just at the model evaluation step. Data validation tests catch upstream changes, integration tests verify the pipeline works end-to-end, and model quality tests ensure the new version meets performance thresholds. Without automated testing, model updates are gambles.
| Test Type | What It Validates | When It Runs | Example |
|---|---|---|---|
| Data validation | Schema, distributions, completeness | Before training | Feature column count matches expected; no new null columns |
| Unit tests | Preprocessing/inference code logic | On code commit | Encoding function produces correct output |
| Integration tests | Pipeline steps connect correctly | After build | Training job accepts preprocessed data format |
| Model quality tests | Performance meets thresholds | After training | Accuracy > 85%, latency < 100ms |
| A/B tests | New model vs. production | During deployment | New model's business metric ≥ old model's |
Retraining triggers:
- Scheduled: Retrain on a fixed cadence (daily, weekly, monthly)
- Data-driven: Retrain when new data volume exceeds a threshold
- Drift-driven: Retrain when Model Monitor detects data or model drift
- Performance-driven: Retrain when production metrics degrade below threshold
⚠️ Exam Trap: Automated retraining without automated validation is dangerous—you could automatically deploy a worse model. Every retraining pipeline must include a quality gate (metric threshold check) before deployment. If a question describes "automatic retraining and deployment" without mentioning validation, the answer should include adding a validation step.
Reflection Question: A pipeline automatically retrains a model weekly and deploys it. One week, the new training data contains a bug that produces a model with 40% accuracy (down from 90%). The bad model reaches production before anyone notices. What pipeline change prevents this?