8.6.1. Backup Strategies and Recovery Sites
💡 First Principle: A backup strategy is only as good as its weakest link — and the weakest link is usually the assumption that backups work. Organizations discover backup failures during actual disasters: tapes are unreadable, backups are incomplete, restoration procedures were never documented, or the backup target was on the same network as the ransomware-encrypted production systems. The 3-2-1 rule exists specifically to eliminate single points of failure in the backup chain.
Backup types compared:
| Type | What It Copies | Speed | Storage | Restore Speed | Restore Complexity |
|---|---|---|---|---|---|
| Full | Everything | Slowest | Most | Fastest (single restore) | Lowest |
| Incremental | Changes since last backup of any type | Fastest | Least | Slowest (chain of incrementals needed) | Highest — must restore full + every incremental in sequence |
| Differential | Changes since last full backup | Medium | Medium | Medium (full + latest differential) | Medium — two restores required |
The 3-2-1 backup rule:
- 3 copies of data (production + 2 backups)
- 2 different storage media types (disk + tape, disk + cloud)
- 1 copy offsite (physically separate location or cloud region)
For ransomware resilience, this extends to 3-2-1-1: the additional "1" is an immutable/air-gapped backup copy that cannot be modified or deleted by an attacker who compromises the production network.
Recovery sites:
| Site Type | Ready State | Recovery Time | Cost | Best For |
|---|---|---|---|---|
| Cold site | Empty facility with power and connectivity; no hardware | Days to weeks | Lowest | Non-critical systems with long MTD |
| Warm site | Hardware pre-installed; data not current | Hours to days | Moderate | Systems with moderate RTO (8–24 hours) |
| Hot site | Fully operational mirror; near-real-time data replication | Minutes to hours | Highest | Mission-critical systems with short RTO (<4 hours) |
| Cloud DR | Infrastructure-as-code; spin up on demand | Minutes to hours (depends on architecture) | Variable (pay on activation) | Organizations with cloud-native or hybrid infrastructure |
| Mobile site | Portable facility (trailer, container) | Hours to days (after transport) | Moderate | Geographically variable disaster scenarios |
Reciprocal agreements: Two organizations agree to provide DR capacity to each other. Low cost but risky — the host organization may not have capacity during a widespread disaster that affects both parties. Generally considered a supplementary, not primary, DR strategy.
⚠️ Exam Trap: "Full backups are always better than incremental or differential." Full backups are simpler to restore but consume the most storage and take the longest to complete. In most production environments, a combination strategy (weekly full + daily differential, or weekly full + daily incremental) balances storage efficiency, backup window, and restore speed. The right strategy depends on RPO requirements and the available backup window.
Reflection Question: A financial services company has an RPO of 15 minutes for its trading platform and an RPO of 24 hours for its HR system. Both systems are currently backed up with nightly full backups. Identify the RPO violation for the trading platform, describe the backup architecture change needed to meet the 15-minute RPO, and explain why the same architecture is unnecessary for the HR system.