3.1.1. Multi-AZ and Multi-Region Design Patterns
š” First Principle: Distributing application components across independent failure domains ("Availability Zones" or "Regions") is the fundamental strategy for ensuring continuous operation and data durability against both localized and widespread outages.
Scenario: An architect is designing a globally available web application with users across multiple continents. The application needs to provide low-latency access and be resilient enough to withstand the complete failure of an entire "AWS Region", minimizing downtime and data loss.
Expanding on foundational "HA" concepts, "Multi-AZ" and "Multi-Region" patterns are key for enterprise-grade resilience.
- "Multi-AZ Design": A strategy that distributes resources across physically isolated
"Availability Zones"within a single"AWS Region". Each"AZ"is an independent data center.- Why: Protects against single data center outages (power, network, cooling).
- Implementation:
"EC2 Auto Scaling Groups"spread instances across"AZs","ELB"distributes traffic,"Amazon RDS Multi-AZ"for synchronous database replication,"Amazon EFS"for shared file systems spanning"AZs". - Practical Relevance: Baseline for almost all production workloads. Provides high availability within a region.
- "Multi-Region Design": A strategy that distributes application components across geographically separate
"AWS Regions".- Why: Protects against widespread regional disasters or significant regional service outages. Provides global low-latency access.
- Implementation:
"Amazon Route 53"(latency-based, geolocation, failover routing),"Amazon DynamoDB Global Tables"(active-active replication),"Amazon S3 Cross-Region Replication","Cross-Region RDS Read Replicas".
Visual: Multi-AZ vs. Multi-Region Deployments
Loading diagram...
ā ļø Common Pitfall: Underestimating the complexity of data replication and consistency in a multi-region architecture. Most cross-region replication is asynchronous, which means there is a potential for data loss ("RPO" > 0) during a failover.
Key Trade-Offs:
- Data Consistency vs. Latency: Synchronous replication (like
"Multi-AZ RDS") guarantees no data loss ("RPO=0") but adds latency to writes. Asynchronous replication (like cross-region"S3 CRR") has minimal impact on write performance but introduces a replication lag.
Reflection Question: How would you design a "Multi-Region" architecture for a globally available web application using services like "Amazon Route 53", "Amazon DynamoDB Global Tables", and "Amazon S3 Cross-Region Replication" to meet its global availability and disaster recovery requirements, especially considering the inherent trade-offs between data consistency and latency?
