4.4.1. High Availability and Site Considerations
💡 First Principle: High availability eliminates single points of failure by providing redundant components that take over when primary components fail. Site diversity ensures that location-specific disasters don't take down all operations.
Load balancing distributes workloads across multiple servers. If one server fails, the others absorb the traffic. Active-active configurations run all servers simultaneously; active-passive configurations keep standby servers ready.
Clustering groups multiple servers that share workloads and fail over automatically. Cluster members monitor each other's health — if one fails, another assumes its role in seconds.
Redundancy types:
| Type | Description | Example |
|---|---|---|
| Server | Multiple servers for same function | Web server cluster |
| Network | Redundant paths and switches | Dual ISPs, link aggregation |
| Storage | RAID, replication | RAID 5, geo-replicated storage |
| Power | UPS, generators, dual feeds | Dual power supplies per server |
Site considerations:
| Site Type | Description | Recovery Time | Cost |
|---|---|---|---|
| Hot site | Fully operational duplicate | Minutes to hours | $$ Highest |
| Warm site | Partial infrastructure, needs data | Hours to days | $ Medium |
| Cold site | Empty facility, needs everything | Days to weeks | $ Lowest |
Platform diversity — using different vendors and technologies reduces the risk that a single vulnerability affects all systems. If all servers run the same OS, one exploit compromises everything.
Multi-cloud systems — distributing workloads across multiple cloud providers prevents vendor lock-in and reduces impact of a single provider outage.
Geographic dispersion — placing redundant systems in different physical locations protects against regional disasters. If primary and backup systems are in the same building — or even the same city — a single earthquake, flood, or power grid failure can take both down simultaneously. Best practice is recovery sites in a different region or availability zone from the primary.
⚠️ Exam Trap: Hot site = fastest recovery, highest cost. Cold site = slowest recovery, lowest cost. The exam tests whether you can match recovery requirements (RTO) to the appropriate site type.