4.1. Selecting Deployment Infrastructure
💡 First Principle: Deployment is where ML models become products. The infrastructure you choose determines latency, cost, reliability, and scalability—and these trade-offs are entirely separate from model quality. A brilliant model deployed on wrong infrastructure fails just as badly as a poor model. The exam tests your ability to match deployment patterns to business requirements.
What happens when you deploy a real-time recommendation model behind a batch endpoint? Users wait minutes for recommendations that should appear in milliseconds—and they leave. What happens when you deploy a weekly reporting model behind a persistent real-time endpoint? You pay 24/7 for infrastructure that's used 2 hours a week. Both are deployment mismatches, and both are tested on the exam.
Think of deployment like choosing a delivery service. Same-day delivery (real-time endpoint) costs more but satisfies urgent needs. Standard shipping (batch) costs less but takes time. A drone (serverless) works for light packages to remote locations but can't handle heavy loads. Each has a purpose—the key is matching the delivery method to the package and urgency.
⚠️ Common Misconception: Serverless endpoints and Lambda-based inference are the same thing. SageMaker Serverless endpoints are managed ML inference endpoints that scale to zero — they handle model loading, container management, and scaling automatically. Lambda functions require you to package the model yourself and have a 250 MB deployment limit. The exam differentiates between them: serverless endpoints for larger models with SageMaker integration, Lambda for lightweight models or simple preprocessing.