Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

5.2.2. Cost Optimization: Spot Instances, Savings Plans, and Rightsizing

💡 First Principle: The most expensive ML resource is the one you're paying for but not using. Cost optimization in ML isn't about buying cheaper hardware—it's about matching resource lifecycle to workload lifecycle. Training jobs that finish should release their compute. Endpoints with no traffic should scale to zero. Development notebooks left running overnight are burning money for nothing.

AWS provides multiple pricing mechanisms, and the exam tests your ability to match them to ML workloads:

Spot Instances offer up to 90% savings over On-Demand pricing but can be interrupted with 2 minutes' notice. This makes them ideal for SageMaker training jobs (which can checkpoint and resume) and non-time-sensitive batch processing. They're not suitable for real-time inference endpoints where interruption means dropped predictions.

On-Demand Instances provide guaranteed capacity at full price. Use them for production inference endpoints where availability is critical and for training jobs that must complete within a deadline.

Reserved Instances and SageMaker Savings Plans offer 30-72% savings in exchange for a 1-year or 3-year commitment. Use them for persistent resources with predictable usage—like a production endpoint that runs 24/7 with consistent traffic.

Pricing OptionSavingsBest ForNot For
Spot InstancesUp to 90%Training jobs, batch transform, processingReal-time endpoints, time-critical training
On-Demand0% (baseline)Production endpoints, deadline trainingLong-running persistent workloads (expensive)
Savings Plans30-72%Predictable, consistent workloadsVariable or experimental workloads
Serverless endpointsPay-per-requestIntermittent traffic (< few requests/min)High-throughput, latency-sensitive

Beyond pricing, rightsizing is the other major cost lever. SageMaker Inference Recommender automates this by load-testing your model across different instance types and recommending the best cost-performance ratio. AWS Compute Optimizer analyzes historical usage patterns and recommends instance changes.

Cost monitoring tools complete the picture: AWS Cost Explorer visualizes spending trends, AWS Budgets alerts when spending exceeds thresholds, and resource tagging enables per-project or per-team cost allocation.

⚠️ Exam Trap: "Use Spot Instances to reduce cost" is a common distractor for real-time endpoint questions. Spot Instances can be interrupted, which is unacceptable for serving live predictions. For endpoints, cost optimization comes from rightsizing (smaller instances), auto scaling (scaling to demand), or Savings Plans (commitment discounts)—not Spot.

Reflection Question: A company trains three models daily (2 hours each) and runs two real-time endpoints 24/7. Which pricing strategy should they use for each workload, and what tools would help identify if any resource is over-provisioned?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications