3. Reliability and Business Continuity (22%)
Reliability is the ability of a system to perform its intended function when needed, and to recover quickly when something fails — because something always eventually fails. In AWS, reliability isn't something you bolt on after the fact; it's built into the architecture through a combination of redundancy, automated health detection, and tested recovery procedures.
This domain ties with Monitoring and Deployment at 22% — expect 11 questions. The exam heavily favors scenario-based questions here: "A company needs 99.99% availability for their e-commerce platform during peak season — which combination of services do you recommend?" The answer requires understanding not just what each service does, but how much availability it actually provides.
The three task areas map cleanly to three principles: scale proactively (Task 2.1), design for failure (Task 2.2), and recover predictably (Task 2.3).