2.1.2. Alarms, Composite Alarms, and Actions
š” First Principle: A metric without an alarm is just a number on a dashboard. Alarms transform passive observation into active response ā they cross the critical threshold from "watching" to "reacting" automatically, without requiring a human to stare at a dashboard 24/7.
Alarm States: Every CloudWatch alarm is always in one of three states:
| State | Meaning | Typical Cause |
|---|---|---|
| OK | Metric is within threshold | Normal operation |
| ALARM | Metric has breached threshold | Problem detected |
| INSUFFICIENT_DATA | Not enough data points | Service just started, metric not being published |
How Alarms Evaluate: Alarms don't trigger on a single data point ā they evaluate over a window of data points to prevent false positives from brief spikes.
Key configuration parameters:
- Period: How long each data point covers (60s, 300s, etc.)
- Evaluation Periods: How many data points to consider
- Datapoints to Alarm: How many of those data points must breach the threshold
Setting "3 out of 5 datapoints must breach" means a 1-minute spike won't trigger the alarm, but a sustained 3-minute breach will. This reduces alert fatigue from transient noise.
Missing Data Treatment: What should the alarm do when no data arrives? Options:
missingā treat as missing (doesn't change alarm state)notBreachingā treat as within threshold (safe default for metrics that stop when resource is deleted)breachingā treat as outside threshold (conservative; use for critical health checks)ignoreā maintain the current alarm state
Alarm Actions define what happens when the alarm transitions state. A single alarm can trigger multiple actions:
| Action Type | Example Use Case |
|---|---|
| SNS Notification | Email/SMS ops team, trigger Lambda |
| EC2 Action | Stop, reboot, recover, or terminate instance |
| Auto Scaling Action | Scale out or scale in |
| Systems Manager OpsItem | Create incident ticket automatically |
Composite Alarms combine multiple alarms using AND/OR logic. This is critical for reducing alert storms. Instead of 50 individual instance alarms paging the team, a composite alarm can trigger only when the overall service is degraded.
ā ļø Exam Trap: Composite alarms can only reference other CloudWatch alarms ā not metrics directly. You must create individual alarms first, then combine them in a composite alarm.
Reflection Question: An RDS instance publishes a metric every 60 seconds. You want to alarm if CPU exceeds 90% for at least 3 consecutive minutes. What are the correct values for Period, Evaluation Periods, and Datapoints to Alarm?