2.3. Automated Remediation
š” First Principle: The goal of operations is not to have faster humans Think of it like a circuit breaker: the system detects a fault and responds instantly, unlike a manual process that depends on someone being awake and available at 3 AM. ā it's to have fewer humans needed in the critical path. Automated remediation closes the loop between detection and resolution without human intervention, which means faster MTTR (mean time to resolution) and fewer 3 AM pages for your team.
The exam tests your ability to wire together the right services for automated remediation. The fundamental architecture is always: trigger ā route ā act. CloudWatch alarms and Config rules trigger. EventBridge routes. Lambda, Systems Manager, or EC2 actions act. Understanding which service does which part of the loop ā and when to use each ā is the core skill.