AWS Certified Machine Learning Engineer - Associate (MLA-C01) Study Guide [160 Minute Read]
A First-Principles Approach to Machine Learning Engineering on AWS
Welcome to the AWS Certified Machine Learning Engineer - Associate (MLA-C01) Study Guide. This guide moves beyond surface-level memorization. It is designed to build a robust mental model of how machine learning systems are built, deployed, and maintained on AWS—understanding the why behind every architectural decision, service selection, and operational trade-off.
Each topic is aligned with the official AWS MLA-C01 Exam Objectives, targeting the specific cognitive skills required for success. Expect scenario-based questions that test your ability to choose the right AWS service for a given ML workflow stage, troubleshoot pipeline failures, and make cost-performance trade-offs—roughly 60% application, 30% analysis, and 10% recall.
Exam Details: 65 questions (50 scored, 15 unscored) — Multiple choice, multiple response, ordering, matching, case study | 170 minutes | Passing score: 720/1000
Prerequisites: At least 1 year of experience using Amazon SageMaker and other AWS services for ML engineering. Familiarity with common ML algorithms, data engineering fundamentals, CI/CD pipelines, and software engineering best practices. Background in a role such as backend developer, DevOps engineer, data engineer, or data scientist is assumed.
Exam Domain Weights
Domain 1 (Data Preparation) carries the heaviest weight at 28%, reflecting the reality that most ML engineering effort goes into getting data right before models ever train. Combined with Domain 4's 24% on monitoring and security, over half the exam tests your ability to handle what happens around the model—not just the model itself. Prioritize these operational areas alongside model development.
(Table of Contents - For Reference)
- Phase 1: First Principles of Machine Learning Engineering
- 1.1. The ML Engineering Mindset
- 1.1.1. Why ML Engineering Differs from Software Engineering
- 1.1.2. The Experiment-to-Production Gap
- 1.2. The ML Lifecycle on AWS
- 1.2.1. Data → Model → Deploy → Monitor: The Four-Stage Loop
- 1.2.2. Where AWS Services Fit in the Lifecycle
- 1.3. The SageMaker Ecosystem
- 1.3.1. SageMaker as the Central Hub
- 1.3.2. SageMaker Components and When to Use Them
- 1.4. Core Trade-offs in ML Systems
- 1.4.1. Cost vs. Performance vs. Latency
- 1.4.2. Managed vs. Custom: When to Build vs. When to Buy
- 1.5. Reflection Checkpoint
- 1.1. The ML Engineering Mindset
- Phase 2: Data Preparation for ML (28%)
- 2.1. Data Ingestion and Storage
- 2.1.1. Data Formats for ML Workloads
- 2.1.2. AWS Storage Options: S3, EFS, FSx, and When to Use Each
- 2.1.3. Streaming Data Ingestion with Kinesis and Kafka
- 2.1.4. Troubleshooting Ingestion Issues
- 2.2. Data Transformation and Feature Engineering
- 2.2.1. Data Cleaning Techniques: Outliers, Missing Data, Deduplication
- 2.2.2. Feature Engineering: Scaling, Binning, and Log Transforms
- 2.2.3. Encoding Techniques: One-Hot, Label, Binary, and Tokenization
- 2.2.4. AWS Transformation Tools: Glue, DataBrew, EMR, and Data Wrangler
- 2.3. Data Integrity and Modeling Readiness
- 2.3.1. Detecting and Mitigating Bias in Training Data
- 2.3.2. Data Quality Validation with Glue Data Quality and DataBrew
- 2.3.3. Data Labeling with Ground Truth and Mechanical Turk
- 2.3.4. Compliance, Encryption, and Data Protection
- 2.4. Reflection Checkpoint
- 2.1. Data Ingestion and Storage
- Phase 3: ML Model Development (26%)
- 3.1. Choosing a Modeling Approach
- 3.1.1. Mapping Business Problems to ML Algorithms
- 3.1.2. SageMaker Built-in Algorithms and When to Apply Them
- 3.1.3. Foundation Models: Bedrock and JumpStart
- 3.1.4. AWS AI Services for Common Business Needs
- 3.2. Training and Refining Models
- 3.2.1. The Training Process: Epochs, Batches, and Steps
- 3.2.2. Hyperparameter Tuning with SageMaker AMT
- 3.2.3. Regularization, Overfitting, and Underfitting
- 3.2.4. Distributed Training and Reducing Training Time
- 3.2.5. Model Versioning with SageMaker Model Registry
- 3.3. Analyzing Model Performance
- 3.3.1. Evaluation Metrics: Confusion Matrix, F1, RMSE, ROC-AUC
- 3.3.2. Model Bias Detection with SageMaker Clarify
- 3.3.3. Debugging Convergence with SageMaker Debugger
- 3.4. Reflection Checkpoint
- 3.1. Choosing a Modeling Approach
- Phase 4: Deployment and Orchestration of ML Workflows (22%)
- 4.1. Selecting Deployment Infrastructure
- 4.1.1. Endpoint Types: Real-Time, Serverless, Async, and Batch
- 4.1.2. Compute Selection: CPU vs. GPU and Instance Families
- 4.1.3. Containers: Pre-built vs. BYOC
- 4.1.4. Edge Deployment with SageMaker Neo
- 4.2. Creating and Scripting Infrastructure
- 4.2.1. Auto Scaling SageMaker Endpoints
- 4.2.2. Infrastructure as Code: CloudFormation and CDK
- 4.2.3. Container Management with ECR, ECS, and EKS
- 4.2.4. VPC Configuration for SageMaker Endpoints
- 4.3. CI/CD Pipelines for ML Workflows
- 4.3.1. CodePipeline, CodeBuild, and CodeDeploy for ML
- 4.3.2. SageMaker Pipelines and Workflow Orchestration
- 4.3.3. Deployment Strategies: Blue/Green, Canary, and Linear
- 4.3.4. Automated Testing and Retraining Mechanisms
- 4.4. Reflection Checkpoint
- 4.1. Selecting Deployment Infrastructure
- Phase 5: ML Solution Monitoring, Maintenance, and Security (24%)
- 5.1. Monitoring Model Inference
- 5.1.1. Data Drift and Model Drift Detection
- 5.1.2. SageMaker Model Monitor in Production
- 5.1.3. A/B Testing for Production Models
- 5.2. Monitoring and Optimizing Infrastructure
- 5.2.1. CloudWatch, X-Ray, and Observability Tools
- 5.2.2. Cost Optimization: Spot Instances, Savings Plans, and Rightsizing
- 5.2.3. Troubleshooting Latency and Scaling Issues
- 5.3. Securing ML Resources
- 5.3.1. IAM Roles, Policies, and Least Privilege Access
- 5.3.2. VPCs, Subnets, and Network Security for ML
- 5.3.3. Encryption, KMS, and Data Protection
- 5.3.4. Security Auditing and Compliance Monitoring
- 5.4. Reflection Checkpoint
- 5.1. Monitoring Model Inference
- Phase 6: Exam Readiness
- 6.1. Exam Strategy and Time Management
- 6.2. Quick Reference Decision Trees and Cheat Sheets
- 6.3. Mixed-Topic Practice Questions
- Phase 7: Glossary
- Phase 8: Conclusion
Start Free. Upgrade When You're Ready.
Stay on your structured path while adding targeted practice with the full set of exam-like questions, expanded flashcards to reinforce concepts, and readiness tracking to identify and address weaknesses when needed.
Content last updated