Copyright (c) 2025 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.4.1.1. Designing for Relational Database Workloads (Scalability, HA, DR)

šŸ’” First Principle: Relational databases must be architected to maintain transactional integrity ("ACID") while providing high availability, effective read scaling, and robust disaster recovery to support critical "OLTP" workloads.

Scenario: A critical online banking application uses "Amazon RDS for PostgreSQL". The application experiences frequent read spikes during business hours, and there's a strict requirement for high availability with minimal downtime in case of database instance failure. Additionally, a disaster recovery plan is needed in a separate "AWS Region" with minimal data loss.

Relational databases remain central for many applications, especially those requiring strong transactional consistency ("ACID" properties).

  • Scalability:
    • Vertical Scaling: Increasing compute/memory (e.g., moving to a larger "RDS instance type"). Simple but limited.
    • Read Replicas ("RDS"/"Aurora"): Asynchronous replication of data to separate instances for read-heavy workloads, offloading the primary. Improves read performance and can be cross-"AZ" or cross-"Region" for "DR".
    • Aurora Scaling: "Aurora" automatically scales its storage and leverages up to 15 read replicas, including "Aurora Serverless" for on-demand capacity.
    • Sharding (Application-level): Partitioning data across multiple database instances, managed by the application. Complex to implement but offers extreme horizontal scaling.
  • High Availability ("HA"):
    • "Multi-AZ Deployment (RDS/Aurora)": Synchronously replicates data to a standby instance in a different "AZ". Provides automatic failover (minutes) in case of primary instance failure or "AZ" outage. No data loss ("RPO=0").
  • Disaster Recovery ("DR"):
    • Cross-Region Read Replicas ("RDS"/"Aurora"): Asynchronously replicate data to a read replica in a different "AWS Region". In a regional disaster, this replica can be promoted to a standalone primary, providing a robust "DR" strategy. "RPO" > 0, "RTO" in minutes.
    • Snapshots: Automated and manual snapshots of "RDS"/"Aurora" instances stored in "S3", can be copied cross-"Region". Higher "RTO"/"RPO" than active replication.
Visual: RDS Scalability, HA, DR Design
Loading diagram...

āš ļø Common Pitfall: Using a "Read Replica" for high availability. A "Read Replica" is for read scaling. If the primary database fails, you must manually promote the replica, which causes downtime and potential data loss due to asynchronous replication lag. A "Multi-AZ" deployment is the correct solution for automatic, synchronous failover.

Key Trade-Offs:
  • High Availability ("Multi-AZ") vs. Read Scaling ("Read Replicas"): "Multi-AZ" is for failover and doesn't improve read performance (the standby is not readable). "Read Replicas" are for offloading read traffic and are not a primary "HA" solution.

Reflection Question: How would you combine "Amazon RDS Read Replicas" and "Multi-AZ" deployment (including cross-region) to meet the scalability, high availability, and disaster recovery requirements for a critical online banking application that experiences frequent read spikes and demands minimal downtime?