Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.1.3.1. Disaster Recovery Concepts (RTO, RPO)

3.1.3.1. Disaster Recovery Concepts (RTO, RPO)

Every DR strategy is a trade-off between cost and recovery speed. The exam tests whether you can match the right strategy to given RTO/RPO constraints.

Recovery Time Objective (RTO): Maximum acceptable time from disaster to full restoration. A 4-hour RTO means the business can tolerate 4 hours of downtime.

Recovery Point Objective (RPO): Maximum acceptable data loss measured in time. A 1-hour RPO means you can lose at most 1 hour of data.

DR strategy spectrum (cost ↔ recovery speed):
StrategyRTORPOCostDescription
Backup & RestoreHoursHours$Backups in S3; restore when needed
Pilot Light10s of minutesMinutes$Core infra running (DB replica), scale up on failover
Warm StandbyMinutesSeconds-Minutes$$Scaled-down duplicate running in DR region
Active-ActiveNear-zeroNear-zero$$Full infrastructure in both regions serving traffic
Key distinction — Pilot Light vs. Warm Standby:
  • Pilot Light: Only the data layer runs continuously (RDS replica, S3 replication). Compute and application layers are off. On disaster, you launch compute, update DNS, and promote the replica.
  • Warm Standby: A fully functional but scaled-down copy of production runs continuously. On disaster, scale up the DR environment and redirect traffic. Faster than Pilot Light because compute is already running.

Exam Trap: If the question specifies RTO < 5 minutes, only Active-Active meets the requirement. Warm Standby requires scaling up (takes minutes). Pilot Light requires launching new instances (takes 10+ minutes). The exam often tests whether you understand these time boundaries — don't choose Warm Standby for a sub-minute RTO.

Alvin Varughese
Written byAlvin Varughese•Founder•15 professional certifications