3.3.3. High Availability & Fault Tolerance for Applications
First Principle: Designing applications with inherent high availability and fault tolerance ensures continuous operation and minimizes downtime, even when underlying infrastructure or application components fail.
For developers, building high availability (HA) and fault tolerance into their applications means designing them to withstand and recover gracefully from failures. This is a key aspect of application reliability in the cloud.
- High Availability (HA): (Ensures a system remains operational, minimizing downtime through redundancy and automatic failover.)
- Examples: Deploying application components across multiple Availability Zones (AZs), using Elastic Load Balancing (ELB) to distribute traffic, and utilizing Amazon RDS Multi-AZ deployments.
- Developer Impact: Requires designing stateless application components and awareness of data consistency.
- Fault Tolerance: (The ability of a system to continue operating even if one or more of its components fail.)
- Examples: Implementing message queues (Amazon SQS) to decouple microservices (preventing cascading failures), designing retry logic in application code for transient errors, and using Circuit Breaker patterns.
- Developer Impact: Requires coding for resilience (e.g., error handling, retries), and understanding how queues and events can buffer failures.
- Self-Healing: (The ability of a system to detect and automatically recover from component failures.)
- Examples: EC2 Auto Scaling Groups automatically replacing unhealthy instances, AWS Lambda automatically scaling and self-healing.
Scenario: You're developing a critical e-commerce application. You need to ensure it remains available to customers even if one of its servers fails or if an underlying database experiences an issue.
ā ļø Exam Trap: Multi-AZ deployments improve availability; multi-Region deployments improve disaster recovery. If a question asks about surviving a single AZ failure, multi-AZ is sufficient. Multi-Region is for Regional failures.
