Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.2.3. Multi-AZ Deployments and Fault-Tolerant Architectures

šŸ’” First Principle: An Availability Zone is a discrete data center (or cluster of data centers) with independent power, cooling, and networking. Distributing your workload across multiple AZs means a failure in one data center — fire, flood, power outage — cannot take down your application. Multi-AZ is the minimum viable HA configuration for any production workload.

The key design principle: nothing that matters should be in only one AZ. Every tier — load balancers, application servers, databases, caches — should have instances in at least two AZs.

RDS Multi-AZ:

RDS Multi-AZ creates a synchronous standby replica in a different AZ. The standby is not accessible for reads — it exists solely for failover.

RDS Multi-AZ FactsDetail
ReplicationSynchronous — every write to primary is committed to standby before acknowledging
Failover timeTypically 60–120 seconds (automated)
Failover triggerPrimary AZ failure, primary host failure, DB instance failure, manual failover, maintenance
DNS updateRDS endpoint DNS is updated automatically — no connection string changes
Standby readable?āŒ No — it's a standby only (unlike read replicas)

Multi-AZ provides HA; read replicas provide read scaling. These are orthogonal features — you can and often should use both.

Aurora Multi-AZ: Aurora stores 6 copies of data across 3 AZs automatically. Failover to an Aurora Replica is typically under 30 seconds. With Aurora, the concept of "standby" is replaced by replicas that are readable and can be promoted to primary.

DynamoDB Multi-AZ: DynamoDB automatically replicates data across 3 AZs in a region. This is built-in and not configurable — DynamoDB is always Multi-AZ.

ELB and Multi-AZ: Load balancers should be deployed across multiple AZs so the load balancer itself isn't a single point of failure. ELB automatically distributes its nodes across the AZs you select.

Fault-Tolerant Architecture Pattern (Three-Tier):

S3 Durability and Availability: S3 Standard provides 99.999999999% (11 9s) durability and 99.99% availability. S3 automatically stores data across a minimum of three AZs. S3 One Zone-IA stores data in a single AZ — cheaper but loses the multi-AZ durability guarantee.

āš ļø Exam Trap: RDS Multi-AZ failover is not instant — expect 60–120 seconds during which the database is unavailable. Applications must handle connection errors and retry logic during this window. If "zero downtime" is required, the architecture needs Aurora Global Database with an application-level connection routing strategy, or active-active database design.

Reflection Question: A company's application uses a single RDS MySQL instance in us-east-1a. The SLA requires 99.95% availability. The current architecture cannot meet this SLA because the single AZ represents a single point of failure. What is the minimum change to meet the SLA, and what additional change maximizes read performance?

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications