1.2. From Metrics to Action: The Operations Loop
š” First Principle: Collecting data is worthless unless it triggers action. The real value of observability is closing the loop: detect ā alert ā diagnose ā remediate ā verify. Think of it like a thermostat: without the feedback loop connecting temperature measurement to the heater, you just have a thermometer ā data with no action. AWS services are designed to automate as many steps of this loop as possible.
Without automation in that loop, every alert requires a human to respond. At scale, that's unsustainable. A company running 10,000 Lambda functions can't have an on-call engineer manually restarting failed functions ā the resolution needs to happen before the engineer finishes reading the alert.
Here's how the loop works in AWS:
The exam tests your knowledge at every step of this loop. Which service triggers alerts? (CloudWatch Alarms ā SNS). Which service executes automated remediation? (Systems Manager Automation, Lambda). Which service routes events to the right responder? (EventBridge). Understand the loop and the service choices become obvious.
ā ļø Exam Trap: SNS and EventBridge are often confused. SNS delivers notifications (fan-out to subscribers). EventBridge routes events to targets based on patterns. You can use both together ā a CloudWatch alarm triggers SNS, SNS triggers Lambda ā but they solve different problems.
Reflection Question: In the operations loop above, at which step does EventBridge fit? Where does CloudTrail fit?