3.2. High Availability and Fault Tolerance
š” First Principle: High availability is not a property of individual components Think of it like a power grid: no single generator is 100% reliable, but the grid stays up because multiple generators connect through redundant transmission lines. Unlike a single-AZ deployment where one hardware failure takes down everything, a multi-AZ architecture treats failures as expected events. ā it's a property of the system as a whole. Any single component can fail; the system remains available if failures don't propagate. The architecture of high availability is therefore about isolating failures so that the failure of one component cannot take down the whole system.
The difference between high availability and fault tolerance is degree. HA systems fail over with brief interruption (seconds to minutes). Fault-tolerant systems continue without any interruption ā the failure is completely transparent to users. AWS provides tools for both, and the exam tests which level is appropriate for a given scenario.
At this domain's core is a simple principle: never have a single path for traffic or a single point of failure. Redundant paths, redundant instances, and automated failure detection are the building blocks.