Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

4.4.2. Continuity of Operations and Capacity Planning

šŸ’” First Principle: Business continuity planning ensures the organization can continue critical functions during and after a disruption. It bridges the gap between the incident and full recovery by defining what's essential, what's acceptable degradation, and how long each system can be unavailable.

Recovery Time Objective (RTO) — the maximum acceptable time a system can be down. A payment processing system might have an RTO of 15 minutes; an internal wiki might have an RTO of 48 hours.

Recovery Point Objective (RPO) — the maximum acceptable data loss measured in time. An RPO of 1 hour means you can afford to lose up to 1 hour of data. This drives backup frequency — an RPO of 1 hour requires at least hourly backups.

Mean Time to Repair (MTTR) — the average time to fix a failed component and restore service. Lower MTTR comes from preparation: documented procedures, spare parts on hand, trained staff, and automated failover. A server with hot-swappable drives and a documented replacement procedure has lower MTTR than one requiring a vendor service call.

Mean Time Between Failures (MTBF) — the average time between component failures. Higher MTBF = more reliable hardware. MTBF helps predict when components will need replacement and how many spares to stock. Together, MTTR and MTBF determine system availability: Availability = MTBF / (MTBF + MTTR). A system with MTBF of 10,000 hours and MTTR of 2 hours has 99.98% availability.

Capacity planning ensures resources meet demand during both normal operations and incidents. Considerations: people (enough trained staff for incident response), technology (sufficient compute/storage/bandwidth for failover), and infrastructure (power, cooling, physical space for recovery operations). Scalability (ability to grow) and elasticity (ability to scale up and down dynamically) are cloud-specific capacity concepts tested on the exam.

āš ļø Exam Trap: RTO is about time to restore. RPO is about data loss tolerance. If a question asks "how much data can you afford to lose?" — that's RPO. If it asks "how quickly must systems be back online?" — that's RTO.

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications