2.1.4. The CloudWatch Agent: Memory, Disk, and Container Metrics
š” First Principle: AWS can only observe what runs on its infrastructure ā not inside your operating system. The CloudWatch agent bridges this gap by running inside your EC2 instance (or container) and shipping OS-level metrics that AWS's hypervisor can't see.
The unified CloudWatch agent replaced the older CloudWatch Logs agent and the CloudWatch monitoring scripts. It handles both metrics and logs in a single agent, and it works on EC2, on-premises servers, and containers.
What the CloudWatch Agent Unlocks:
| Metric Type | Examples | Default (No Agent) |
|---|---|---|
| Memory | mem_used_percent, mem_available | ā Not published |
| Disk | disk_used_percent, disk_free | ā Not published |
| Swap | swap_used_percent | ā Not published |
| Per-process | CPU/memory per PID | ā Not published |
| Custom StatsD | Any application metric | ā Not published |
Agent Configuration and Deployment: The agent is configured via a JSON configuration file. The recommended approach for fleet-wide deployment:
- Store the agent config in SSM Parameter Store (a standard parameter named
/AmazonCloudWatch-agent) - Use SSM Run Command or State Manager to install and start the agent across your fleet
- The agent reads its config from Parameter Store at startup
This approach means you never need to SSH into instances to configure monitoring.
Container Insights ā for ECS and EKS:
| Platform | How Container Insights Works |
|---|---|
| Amazon ECS | Enable Container Insights at the cluster level; CloudWatch automatically collects CPU, memory, network per task and per container |
| Amazon EKS | Deploy the CloudWatch agent as a DaemonSet; collects metrics from the kubelet and sends to CloudWatch |
Container Insights gives you visibility into:
- Cluster-level resource utilization
- Service-level CPU and memory
- Task/pod-level metrics
- Node-level metrics (EKS)
ā ļø Exam Trap: Container Insights is not enabled by default ā you must explicitly enable it. For ECS, it's a cluster-level setting. For EKS, it requires deploying the CloudWatch agent DaemonSet. The exam tests whether you know the additional configuration step.
Reflection Question: Your ECS tasks are running out of memory but the default CloudWatch ECS metrics look normal. What is the most likely cause, and what action do you take?