4.1.2. Compute Selection: CPU vs. GPU and Instance Families
💡 First Principle: GPUs excel at parallel matrix operations (neural network inference, image processing). CPUs excel at general-purpose computation (tree-based models, feature lookup). Choosing a GPU for an XGBoost model or a CPU for a ResNet model is paying a premium for capabilities you don't use—and the exam tests this alignment.
| Instance Family | Optimized For | When to Use | Example |
|---|---|---|---|
| ml.m5 (General purpose) | Balanced CPU/memory | Small models, preprocessing, general workloads | Lightweight inference, data processing |
| ml.c5 (Compute optimized) | CPU-intensive computation | Tree models (XGBoost), ensemble inference | High-throughput tabular model inference |
| ml.r5 (Memory optimized) | Large memory footprint | Large feature stores, embedding lookups | NLP models with large vocabularies |
| ml.p3/p4 (GPU accelerated) | Parallel computation | Neural network training and inference | Image classification, NLP transformers |
| ml.g4dn/g5 (GPU inference) | Cost-effective GPU inference | Deep learning inference at scale | Real-time image/video processing |
| ml.inf1/inf2 (Inferentia) | ML inference (AWS custom chip) | High-throughput, cost-optimized inference | Serving transformer models at scale |
SageMaker Inference Recommender automates instance selection by running load tests across multiple instance types and recommending the best cost-performance combination for your specific model. Use it instead of guessing—the exam tests whether you know this tool exists.
⚠️ Exam Trap: The cheapest instance that meets latency requirements is the correct answer, not the most powerful. If a question describes an XGBoost model serving 100 requests/second with sub-200ms latency, an ml.c5.xlarge might suffice—an ml.p3.2xlarge would work but costs 10× more for no benefit. Always match compute to model type.
Reflection Question: You need to deploy a BERT-based text classification model and an XGBoost tabular model to production. Would you use the same instance type for both? Why or why not?