3.2.2.2. Common CloudWatch Metrics and Logs (EC2 CPU, RDS Queue, ALB 5xx)

3.2.2.2. Common CloudWatch Metrics and Logs (EC2 CPU, RDS Queue, ALB Errors)

The exam expects you to know which metrics exist by default and what they indicate.

EC2 metrics (default — no agent required):

CPUUtilization: Percentage of allocated compute units. Not the same as OS-level CPU.
NetworkIn/Out: Bytes transferred. Useful for detecting unusual traffic patterns.
StatusCheckFailed_Instance: Application-level failure (guest OS issue)
StatusCheckFailed_System: Hardware/hypervisor failure (AWS issue)
⚠️ Memory and disk are NOT default — requires CloudWatch Agent

RDS metrics:

CPUUtilization: Database CPU usage
DatabaseConnections: Active connections. Alarm near max to prevent connection exhaustion.
FreeStorageSpace: Remaining disk. Running out causes write failures.
ReadIOPS/WriteIOPS: I/O operations. Spikes indicate query performance issues.
ReplicaLag: Seconds behind primary. Critical for read replica health.

ALB metrics:

RequestCount: Total requests
HTTPCode_Target_5XX_Count: Backend errors (your application's fault)
HTTPCode_ELB_5XX_Count: Load balancer errors (e.g., no healthy targets)
TargetResponseTime: Backend latency — the most important single metric for API health
HealthyHostCount/UnHealthyHostCount: Target health status

Lambda metrics:

Invocations: Total calls
Errors: Function failures (exceptions, timeouts)
Throttles: Invocations rejected due to concurrency limits
Duration: Execution time (watch p99, not average)
ConcurrentExecutions: Current parallel executions

Exam Trap: HTTPCode_ELB_502 (Bad Gateway) means the ALB couldn't establish a connection to the target. This typically indicates the target is not listening on the configured port, the security group blocks the ALB, or the target crashed. HTTPCode_ELB_503 means no healthy targets available. These ELB-level errors are different from target-level 5XX errors and have different root causes.

Written byAlvin Varughese•Founder•15 professional certifications