3.2.1.1. Monitoring Applications & Infrastructure Overview

3.2.1.1. Monitoring, Logging & Observability Overview

Monitoring tells you something is wrong. Observability tells you why. The distinction matters: you can monitor a system you don't understand, but you can't diagnose it without observability.

Three pillars of observability:

Metrics: Numeric measurements over time (CPU utilization, request count, error rate). CloudWatch Metrics is the primary service.
Logs: Discrete events with context (application errors, access logs, audit trails). CloudWatch Logs, S3, OpenSearch.
Traces: End-to-end request paths across distributed services. AWS X-Ray captures traces showing latency at each service hop.

How they work together: A CloudWatch alarm fires on high error rate (metric). You query CloudWatch Logs Insights to find the failing requests (logs). You trace a failing request through X-Ray to discover a downstream service timeout (trace). Without all three, you're guessing.

AWS observability stack:

Pillar	Service	Purpose
Metrics	CloudWatch Metrics	Collect, store, and alarm on time-series data
Logs	CloudWatch Logs	Centralized log ingestion, search, and retention
Traces	AWS X-Ray	Distributed tracing across microservices
Dashboards	CloudWatch Dashboards	Unified visualization of metrics and logs
Synthetics	CloudWatch Synthetics	Canary scripts that probe endpoints on schedule
RUM	CloudWatch RUM	Real user monitoring from browsers

Exam Trap: CloudWatch collects metrics from most AWS services automatically (EC2 CPU, RDS connections, ALB request count). But memory utilization and disk usage for EC2 are not collected by default — you must install the CloudWatch Agent to publish these as custom metrics. This is one of the most frequently tested facts on the exam.

Written byAlvin Varughese•Founder•15 professional certifications