Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.
6.2. Quick Reference Decision Trees and Cheat Sheets
🎯 Quick Reference: Endpoint Selection
| Question | If Yes → | If No → |
|---|---|---|
| Need real-time predictions (<100ms)? | Real-time endpoint | Continue ↓ |
| Processing large payloads (>6MB) or long inference (>60s)? | Async endpoint | Continue ↓ |
| Traffic is intermittent (<1 req/min) and cold starts are OK? | Serverless endpoint | Continue ↓ |
| Processing large datasets without real-time need? | Batch Transform | Re-evaluate requirements |
🎯 Quick Reference: Managed vs. Custom Decision
🎯 Quick Reference: Data Prep Service Selection
| Scenario | Service | Why |
|---|---|---|
| Visual data exploration and transformation with minimal code | SageMaker Data Wrangler | Built-in visualizations, no Spark required |
| Large-scale ETL on petabyte data | AWS Glue | Serverless Spark, scales massively |
| Simple data transformations with recipe-based UI | AWS Glue DataBrew | 250+ built-in transformations, no code |
| Custom Spark processing on large datasets | Amazon EMR | Full Spark/Hadoop cluster control |
| Streaming data transformation | Amazon Kinesis + Lambda or Managed Flink | Real-time processing |
| Feature storage and reuse across teams | SageMaker Feature Store | Online (real-time) + offline (batch) stores |
🎯 Quick Reference: Monitoring Service Selection
| What to Monitor | Service | Trigger |
|---|---|---|
| Feature distributions vs. baseline | Model Monitor (Data Quality) | Scheduled job |
| Model accuracy vs. baseline | Model Monitor (Model Quality) | Requires ground truth |
| Fairness metrics for protected groups | Model Monitor (Bias Drift) via Clarify | Scheduled job |
| Endpoint latency, CPU, memory | CloudWatch Metrics + Alarms | Threshold-based |
| Error logs, stack traces | CloudWatch Logs + Logs Insights | Query-based |
| Cross-service request tracing | AWS X-Ray | On-demand |
| API call audit trail | AWS CloudTrail | Continuous |
| PII in training data | Amazon Macie | Scheduled scan |
| Resource configuration compliance | AWS Config | Rule-based |
🎯 Quick Reference: Security Controls
| Protection Layer | Service | What It Controls |
|---|---|---|
| Identity (who) | IAM roles, policies, SageMaker Role Manager | What actions users and services can perform |
| Network (where) | VPC, security groups, network isolation, PrivateLink | What resources can communicate |
| Data (what) | KMS, SSE, Secrets Manager | Whether data is readable if intercepted |
| Audit (when/how) | CloudTrail, Config, Macie | Whether policies are followed |
🎯 Quick Reference: Cost Optimization by Workload
| ML Workload | Pricing Strategy | Additional Savings |
|---|---|---|
| Training jobs (fault-tolerant) | Spot Instances (up to 90% off) | Use managed spot training with checkpointing |
| Production endpoint (24/7 steady) | Savings Plans (30-72% off) | Rightsizing with Inference Recommender |
| Production endpoint (variable) | On-Demand + auto scaling | Scale to zero during off-hours |
| Low-traffic endpoint (<1 req/min) | Serverless endpoint | Pay only per invocation |
| Batch scoring (weekly) | Batch Transform + Spot | No persistent endpoint cost |
| Development/notebooks | On-Demand + lifecycle configs | Auto-stop idle notebooks |
Written byAlvin Varughese
Founder•15 professional certifications