3.2.1.1. Monitoring Applications & Infrastructure Overview
3.2.1.1. Monitoring, Logging & Observability Overview
Monitoring tells you something is wrong. Observability tells you why. The distinction matters: you can monitor a system you don't understand, but you can't diagnose it without observability.
Three pillars of observability:
- Metrics: Numeric measurements over time (CPU utilization, request count, error rate). CloudWatch Metrics is the primary service.
- Logs: Discrete events with context (application errors, access logs, audit trails). CloudWatch Logs, S3, OpenSearch.
- Traces: End-to-end request paths across distributed services. AWS X-Ray captures traces showing latency at each service hop.
How they work together: A CloudWatch alarm fires on high error rate (metric). You query CloudWatch Logs Insights to find the failing requests (logs). You trace a failing request through X-Ray to discover a downstream service timeout (trace). Without all three, you're guessing.
AWS observability stack:
| Pillar | Service | Purpose |
|---|---|---|
| Metrics | CloudWatch Metrics | Collect, store, and alarm on time-series data |
| Logs | CloudWatch Logs | Centralized log ingestion, search, and retention |
| Traces | AWS X-Ray | Distributed tracing across microservices |
| Dashboards | CloudWatch Dashboards | Unified visualization of metrics and logs |
| Synthetics | CloudWatch Synthetics | Canary scripts that probe endpoints on schedule |
| RUM | CloudWatch RUM | Real user monitoring from browsers |
Exam Trap: CloudWatch collects metrics from most AWS services automatically (EC2 CPU, RDS connections, ALB request count). But memory utilization and disk usage for EC2 are not collected by default — you must install the CloudWatch Agent to publish these as custom metrics. This is one of the most frequently tested facts on the exam.
