1.3.3. Mapping AWS Services to the ML Pipeline (SageMaker, Data Wrangler, Model Monitor)
First Principle: AWS provides a portfolio of services that map directly to the stages of the ML lifecycle, enabling teams to build, train, and operate ML solutions efficiently on a unified platform.
Knowing which tool to use at which stage is a key skill for a practitioner.
- Data Prep (EDA, Pre-processing, Feature Engineering):
- Amazon SageMaker Data Wrangler: A visual tool to explore, clean, and prepare data with minimal code.
- Model Training & Tuning:
- Amazon SageMaker: The core service for training custom models, either with built-in algorithms or your own code. It also handles hyperparameter tuning automatically.
- Model Management & Governance:
- Amazon SageMaker Feature Store: A central repository to store, share, and manage curated features for ML models.
- Amazon SageMaker Model Registry: A central place to catalog, version, and manage your trained models for deployment.
- Deployment:
- Amazon SageMaker: Deploys models to real-time endpoints or for batch processing.
- Monitoring:
- Amazon SageMaker Model Monitor: Automatically detects data drift and model quality degradation in production.
- Orchestration:
- Amazon SageMaker Pipelines: A service to build, automate, and manage end-to-end ML workflows.
Scenario: A team needs to build a custom classification model. They have raw data in Amazon S3 and want to use a managed, integrated workflow on AWS.
Reflection Question: How would you guide them to use Data Wrangler for data prep, SageMaker for training, and Model Monitor for post-deployment, explaining how these services work together in a cohesive pipeline?
š” Tip: Think of Amazon SageMaker as the "workbench" for the ML practitioner. It contains specialized tools (like Data Wrangler, Feature Store, Model Monitor) for every phase of the custom model-building process. Use Amazon SageMaker Pipelines to automate and orchestrate these steps into a repeatable workflow.