2.4.1.1. Designing for Relational Database Workloads (Scalability, HA, DR)
š” First Principle: Relational databases must be architected to maintain transactional integrity ("ACID"
) while providing high availability, effective read scaling, and robust disaster recovery to support critical "OLTP"
workloads.
Scenario: A critical online banking application uses "Amazon RDS for PostgreSQL"
. The application experiences frequent read spikes during business hours, and there's a strict requirement for high availability with minimal downtime in case of database instance failure. Additionally, a disaster recovery plan is needed in a separate "AWS Region"
with minimal data loss.
Relational databases remain central for many applications, especially those requiring strong transactional consistency ("ACID"
properties).
- Scalability:
- Vertical Scaling: Increasing compute/memory (e.g., moving to a larger
"RDS instance type"
). Simple but limited. - Read Replicas (
"RDS"
/"Aurora"
): Asynchronous replication of data to separate instances for read-heavy workloads, offloading the primary. Improves read performance and can be cross-"AZ"
or cross-"Region"
for"DR"
. - Aurora Scaling:
"Aurora"
automatically scales its storage and leverages up to 15 read replicas, including"Aurora Serverless"
for on-demand capacity. - Sharding (Application-level): Partitioning data across multiple database instances, managed by the application. Complex to implement but offers extreme horizontal scaling.
- Vertical Scaling: Increasing compute/memory (e.g., moving to a larger
- High Availability (
"HA"
):- "Multi-AZ Deployment (RDS/Aurora)": Synchronously replicates data to a standby instance in a different
"AZ"
. Provides automatic failover (minutes) in case of primary instance failure or"AZ"
outage. No data loss ("RPO=0"
).
- "Multi-AZ Deployment (RDS/Aurora)": Synchronously replicates data to a standby instance in a different
- Disaster Recovery (
"DR"
):- Cross-Region Read Replicas (
"RDS"
/"Aurora"
): Asynchronously replicate data to a read replica in a different"AWS Region"
. In a regional disaster, this replica can be promoted to a standalone primary, providing a robust"DR"
strategy."RPO"
> 0,"RTO"
in minutes. - Snapshots: Automated and manual snapshots of
"RDS"
/"Aurora"
instances stored in"S3"
, can be copied cross-"Region"
. Higher"RTO"
/"RPO"
than active replication.
- Cross-Region Read Replicas (
Visual: RDS Scalability, HA, DR Design
Loading diagram...
ā ļø Common Pitfall: Using a "Read Replica"
for high availability. A "Read Replica"
is for read scaling. If the primary database fails, you must manually promote the replica, which causes downtime and potential data loss due to asynchronous replication lag. A "Multi-AZ"
deployment is the correct solution for automatic, synchronous failover.
Key Trade-Offs:
- High Availability (
"Multi-AZ"
) vs. Read Scaling ("Read Replicas"
):"Multi-AZ"
is for failover and doesn't improve read performance (the standby is not readable)."Read Replicas"
are for offloading read traffic and are not a primary"HA"
solution.
Reflection Question: How would you combine "Amazon RDS Read Replicas"
and "Multi-AZ"
deployment (including cross-region) to meet the scalability, high availability, and disaster recovery requirements for a critical online banking application that experiences frequent read spikes and demands minimal downtime?