3.1.1. Multi-AZ and Multi-Region Design Patterns
š” First Principle: Distributing application components across independent failure domains ("Availability Zones"
or "Regions"
) is the fundamental strategy for ensuring continuous operation and data durability against both localized and widespread outages.
Scenario: An architect is designing a globally available web application with users across multiple continents. The application needs to provide low-latency access and be resilient enough to withstand the complete failure of an entire "AWS Region"
, minimizing downtime and data loss.
Expanding on foundational "HA"
concepts, "Multi-AZ"
and "Multi-Region"
patterns are key for enterprise-grade resilience.
- "Multi-AZ Design": A strategy that distributes resources across physically isolated
"Availability Zones"
within a single"AWS Region"
. Each"AZ"
is an independent data center.- Why: Protects against single data center outages (power, network, cooling).
- Implementation:
"EC2 Auto Scaling Groups"
spread instances across"AZs"
,"ELB"
distributes traffic,"Amazon RDS Multi-AZ"
for synchronous database replication,"Amazon EFS"
for shared file systems spanning"AZs"
. - Practical Relevance: Baseline for almost all production workloads. Provides high availability within a region.
- "Multi-Region Design": A strategy that distributes application components across geographically separate
"AWS Regions"
.- Why: Protects against widespread regional disasters or significant regional service outages. Provides global low-latency access.
- Implementation:
"Amazon Route 53"
(latency-based, geolocation, failover routing),"Amazon DynamoDB Global Tables"
(active-active replication),"Amazon S3 Cross-Region Replication"
,"Cross-Region RDS Read Replicas"
.
Visual: Multi-AZ vs. Multi-Region Deployments
Loading diagram...
ā ļø Common Pitfall: Underestimating the complexity of data replication and consistency in a multi-region architecture. Most cross-region replication is asynchronous, which means there is a potential for data loss ("RPO"
> 0) during a failover.
Key Trade-Offs:
- Data Consistency vs. Latency: Synchronous replication (like
"Multi-AZ RDS"
) guarantees no data loss ("RPO=0"
) but adds latency to writes. Asynchronous replication (like cross-region"S3 CRR"
) has minimal impact on write performance but introduces a replication lag.
Reflection Question: How would you design a "Multi-Region"
architecture for a globally available web application using services like "Amazon Route 53"
, "Amazon DynamoDB Global Tables"
, and "Amazon S3 Cross-Region Replication"
to meet its global availability and disaster recovery requirements, especially considering the inherent trade-offs between data consistency and latency?