Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.4.1. Compute Optimization: EC2, Placement Groups, and Compute Optimizer

šŸ’” First Principle: The right instance type for the wrong workload wastes money and degrades performance simultaneously. Compute optimization is about alignment — matching instance characteristics (CPU architecture, memory ratio, network bandwidth) to workload requirements.

EC2 Placement Groups solve a different problem: physical placement within a data center affects inter-node latency. For workloads that need high-speed, low-latency communication between instances, physical proximity matters.

Placement StrategyCharacteristicBest For
ClusterAll instances in the same rack in one AZHPC, distributed computing, Hadoop, ML training — needs low latency and high throughput
PartitionInstances spread across logical partitions (each on separate racks); up to 7 partitions per AZLarge distributed systems (Cassandra, Kafka, HDFS) — limits correlated hardware failures
SpreadEach instance on a distinct rack; max 7 instances per AZSmall critical workloads that must avoid simultaneous failure (e.g., 3-node clusters)

Burstable Instances (T-family): T3, T3a, T4g instances earn CPU credits during idle periods and spend them during bursts. This is cost-effective for workloads with low average CPU but occasional spikes (dev environments, web servers with variable traffic).

  • T3.unlimited mode: instance can burst beyond credit balance at an extra per-vCPU cost — no throttling, but watch the bill
  • T3.standard mode: CPU throttles to baseline when credits are exhausted

AWS Compute Optimizer analyzes CloudWatch metrics for your EC2 instances, Lambda functions, ECS services on Fargate, and Auto Scaling groups, then makes machine learning-based recommendations:

Recommendation TypeExample
Over-provisionedYour m5.xlarge runs at 8% CPU — right-size to m5.large
Under-provisionedYour c5.large is consistently at 95% CPU — upgrade to c5.xlarge
Instance family changeSwitch from M-series to C-series for compute-intensive workloads

Compute Optimizer uses CloudWatch metrics as its data source — specifically the last 14 days by default (up to 3 months with Enhanced Infrastructure Metrics). If an instance is brand new, Compute Optimizer has no recommendations yet.

āš ļø Exam Trap: Placement groups have constraints. Cluster placement groups are confined to a single AZ — you can't span AZs. If an EC2 launch fails because no capacity exists in that AZ, the whole cluster placement group launch fails together (insufficient capacity errors are common for cluster groups on larger instance types). Spread placement groups are limited to 7 instances per AZ.

Reflection Question: A distributed machine learning training job needs 64 GPU instances to communicate with each other at maximum network throughput. Which placement group type do you use, and what potential operational risk should you mitigate?

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications