2.4.1.1. Designing for Relational Database Workloads (Scalability, HA, DR)
2.4.1.1. Designing for Relational Database Workloads (Scalability, HA, DR)
š” First Principle: Relational databases must be architected to maintain transactional integrity ("ACID") while providing high availability, effective read scaling, and robust disaster recovery to support critical "OLTP" workloads.
Scenario: A critical online banking application uses "Amazon RDS for PostgreSQL". The application experiences frequent read spikes during business hours, and there's a strict requirement for high availability with minimal downtime in case of database instance failure. Additionally, a disaster recovery plan is needed in a separate "AWS Region" with minimal data loss.
Relational databases remain central for many applications, especially those requiring strong transactional consistency ("ACID" properties).
- Scalability:
- Vertical Scaling: Increasing compute/memory (e.g., moving to a larger
"RDS instance type"). Simple but limited. - Read Replicas (
"RDS"/"Aurora"): Asynchronous replication of data to separate instances for read-heavy workloads, offloading the primary. Improves read performance and can be cross-"AZ"or cross-"Region"for"DR". - Aurora Scaling:
"Aurora"automatically scales its storage and leverages up to 15 read replicas, including"Aurora Serverless"for on-demand capacity. - Sharding (Application-level): Partitioning data across multiple database instances, managed by the application. Complex to implement but offers extreme horizontal scaling.
- Vertical Scaling: Increasing compute/memory (e.g., moving to a larger
- High Availability (
"HA"):- "Multi-AZ Deployment (RDS/Aurora)": Synchronously replicates data to a standby instance in a different
"AZ". Provides automatic failover (minutes) in case of primary instance failure or"AZ"outage. No data loss ("RPO=0").
- "Multi-AZ Deployment (RDS/Aurora)": Synchronously replicates data to a standby instance in a different
- Disaster Recovery (
"DR"):- Cross-Region Read Replicas (
"RDS"/"Aurora"): Asynchronously replicate data to a read replica in a different"AWS Region". In a regional disaster, this replica can be promoted to a standalone primary, providing a robust"DR"strategy."RPO"> 0,"RTO"in minutes. - Snapshots: Automated and manual snapshots of
"RDS"/"Aurora"instances stored in"S3", can be copied cross-"Region". Higher"RTO"/"RPO"than active replication.
- Cross-Region Read Replicas (
Visual: RDS Scalability, HA, DR Design
Loading diagram...
ā ļø Common Pitfall: Using a "Read Replica" for high availability. A "Read Replica" is for read scaling. If the primary database fails, you must manually promote the replica, which causes downtime and potential data loss due to asynchronous replication lag. A "Multi-AZ" deployment is the correct solution for automatic, synchronous failover.
Key Trade-Offs:
- High Availability (
"Multi-AZ") vs. Read Scaling ("Read Replicas"):"Multi-AZ"is for failover and doesn't improve read performance (the standby is not readable)."Read Replicas"are for offloading read traffic and are not a primary"HA"solution.
Reflection Question: How would you combine "Amazon RDS Read Replicas" and "Multi-AZ" deployment (including cross-region) to meet the scalability, high availability, and disaster recovery requirements for a critical online banking application that experiences frequent read spikes and demands minimal downtime?
