Copyright (c) 2025 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.1.1. Multi-AZ and Multi-Region Design Patterns

šŸ’” First Principle: Distributing application components across independent failure domains ("Availability Zones" or "Regions") is the fundamental strategy for ensuring continuous operation and data durability against both localized and widespread outages.

Scenario: An architect is designing a globally available web application with users across multiple continents. The application needs to provide low-latency access and be resilient enough to withstand the complete failure of an entire "AWS Region", minimizing downtime and data loss.

Expanding on foundational "HA" concepts, "Multi-AZ" and "Multi-Region" patterns are key for enterprise-grade resilience.

  • "Multi-AZ Design": A strategy that distributes resources across physically isolated "Availability Zones" within a single "AWS Region". Each "AZ" is an independent data center.
    • Why: Protects against single data center outages (power, network, cooling).
    • Implementation: "EC2 Auto Scaling Groups" spread instances across "AZs", "ELB" distributes traffic, "Amazon RDS Multi-AZ" for synchronous database replication, "Amazon EFS" for shared file systems spanning "AZs".
    • Practical Relevance: Baseline for almost all production workloads. Provides high availability within a region.
  • "Multi-Region Design": A strategy that distributes application components across geographically separate "AWS Regions".
    • Why: Protects against widespread regional disasters or significant regional service outages. Provides global low-latency access.
    • Implementation: "Amazon Route 53" (latency-based, geolocation, failover routing), "Amazon DynamoDB Global Tables" (active-active replication), "Amazon S3 Cross-Region Replication", "Cross-Region RDS Read Replicas".
Visual: Multi-AZ vs. Multi-Region Deployments
Loading diagram...

āš ļø Common Pitfall: Underestimating the complexity of data replication and consistency in a multi-region architecture. Most cross-region replication is asynchronous, which means there is a potential for data loss ("RPO" > 0) during a failover.

Key Trade-Offs:
  • Data Consistency vs. Latency: Synchronous replication (like "Multi-AZ RDS") guarantees no data loss ("RPO=0") but adds latency to writes. Asynchronous replication (like cross-region "S3 CRR") has minimal impact on write performance but introduces a replication lag.

Reflection Question: How would you design a "Multi-Region" architecture for a globally available web application using services like "Amazon Route 53", "Amazon DynamoDB Global Tables", and "Amazon S3 Cross-Region Replication" to meet its global availability and disaster recovery requirements, especially considering the inherent trade-offs between data consistency and latency?