Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

8.2.1. SIEM Architecture and Log Management

💡 First Principle: Every BCP/DR metric traces back to a business decision: how much downtime and data loss can the business tolerate before the financial, regulatory, or reputational consequences become unacceptable? These tolerances — not IT capabilities — drive the recovery objectives. IT must then design and fund solutions that meet the business's stated tolerances.

Core BCP/DR metrics:
MetricDefinitionWho Determines ItRelationship
MTD (Maximum Tolerable Downtime)The longest the business can survive without a critical functionBusiness ownersMTD is the ceiling — all recovery plans must complete within it
RTO (Recovery Time Objective)Target time to restore function after disruptionIT + BusinessRTO < MTD (RTO is the IT target; MTD is the business limit)
RPO (Recovery Point Objective)Maximum acceptable data loss measured in timeBusiness ownersRPO drives backup frequency (RPO = 4 hours → backup every 4 hours)
MTTR (Mean Time To Repair)Average time to repair a failed componentIT OperationsOperational metric; inputs to RTO planning
MTBF (Mean Time Between Failures)Average time between component failuresVendor / IT OperationsReliability metric; informs redundancy decisions

The critical inequality: RTO < MTD If a business's MTD for online ordering is 4 hours (after 4 hours, customers go to competitors permanently), and the current RTO is 6 hours (current DR plan takes 6 hours to restore), the organization has a gap. The DR plan must be improved or the business must accept higher risk — this is a governance decision, not a technical one.

Business Impact Analysis (BIA): The BIA is the foundational input to BCP/DR planning. It identifies:

  1. Critical business functions (which processes are essential to survival?)
  2. Dependencies between functions (which IT systems support which processes?)
  3. Financial impact over time (what is the cost per hour of downtime for each function?)
  4. MTD, RTO, and RPO for each critical function (business owners define these)
Recovery site strategies:
Site TypeDescriptionRTOCostAppropriate For
Hot siteFully operational mirror; data replicated in real-time or near-real-timeMinutes to hoursVery High $$Critical financial systems; healthcare; safety
Warm siteInfrastructure ready; software installed; data must be restored from backupHours to daysMedium $Most business-critical applications
Cold siteSpace and power available; no equipment or dataDays to weeksLow $Non-critical functions; cost-constrained orgs
Cloud DRCloud infrastructure spun up from templates on demandMinutes to hoursMedium (pay-per-use)Flexible; increasingly common
Reciprocal agreementTwo organizations agree to host each other in disasterVariesLowSmall orgs; last resort (often unreliable)
BCP/DR testing types — test from least to most disruptive:
Test TypeDescriptionDisruptionValue
Checklist reviewReview the plan document for completenessNoneLow — only finds documentation gaps
Tabletop exerciseWalk through a scenario verballyNoneMedium — finds procedural gaps
Parallel testBring up DR systems while production stays onlineLowHigh — confirms DR systems work
Full interruption testSwitch completely to DR; production offlineHighHighest — most realistic; significant risk
SimulationRealistic scenario exercise with actual team actionsLow-MediumHigh — finds coordination gaps

Organizations should graduate from checklist reviews to tabletop exercises to parallel tests as the plan matures. Full interruption tests should be rare given the operational risk.

⚠️ Exam Trap: MTD > RTO is required — but this is frequently stated backwards in exam distractors. Remember: MTD is the business limit (the ceiling); RTO is the IT target (must be lower than the ceiling). An RTO that exceeds MTD means IT cannot recover fast enough to prevent unacceptable business impact — this is a gap that must be addressed.

Reflection Question: An organization's BIA identifies that its e-commerce platform has a MTD of 2 hours and an RPO of 15 minutes. The current DR solution is a warm site with daily backups. Without additional detail, what two gaps in the current DR solution does this BIA reveal, and what technical changes would address each gap?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications