4.3.1. CodePipeline, CodeBuild, and CodeDeploy for ML
š” First Principle: AWS Developer Tools (CodePipeline, CodeBuild, CodeDeploy) provide the CI/CD backbone. CodePipeline orchestrates the workflow, CodeBuild runs build/test steps, and CodeDeploy manages deployment strategies. For ML, these tools integrate with SageMaker to automate the full model lifecycle.
| Service | Role in ML CI/CD | Typical Actions |
|---|---|---|
| CodePipeline | Orchestrates the end-to-end pipeline | Trigger on code commit ā build ā test ā deploy ā monitor |
| CodeBuild | Runs build and test steps | Run unit tests, package model code, validate model metrics |
| CodeDeploy | Manages deployment to endpoints | Blue/green deployment, canary rollouts, automatic rollback |
A typical ML CI/CD pipeline:
- Code pushed to repository (CodeCommit or GitHub)
- CodePipeline triggers CodeBuild
- CodeBuild runs unit tests on preprocessing/inference code
- CodeBuild triggers SageMaker training job
- Pipeline validates metrics against thresholds
- Model registered in SageMaker Model Registry (PendingApproval)
- Manual or automated approval gate
- CodeDeploy updates SageMaker endpoint (blue/green)
- Monitoring validates production performance
- Auto-rollback if metrics degrade
ā ļø Exam Trap: CodeDeploy's deployment strategies (blue/green, canary) are separate from SageMaker's traffic shifting capabilities. When deploying to SageMaker endpoints, you can use SageMaker's built-in traffic management (production variants with weight shifting) OR CodeDeploy integration. The question will signal which approach by mentioning either "SageMaker production variants" or "CodeDeploy deployment group."
Reflection Question: A team's ML pipeline breaks when a data engineer changes a feature column name upstream. What CI/CD test should catch this before training starts?