4.4.1. High Availability and Site Considerations
š” First Principle: High availability eliminates single points of failure by providing redundant components that take over when primary components fail. Site diversity ensures that location-specific disasters don't take down all operations.
Load balancing distributes workloads across multiple servers. If one server fails, the others absorb the traffic. Active-active configurations run all servers simultaneously; active-passive configurations keep standby servers ready.
Clustering groups multiple servers that share workloads and fail over automatically. Cluster members monitor each other's health ā if one fails, another assumes its role in seconds.
Redundancy types:
| Type | Description | Example |
|---|---|---|
| Server | Multiple servers for same function | Web server cluster |
| Network | Redundant paths and switches | Dual ISPs, link aggregation |
| Storage | RAID, replication | RAID 5, geo-replicated storage |
| Power | UPS, generators, dual feeds | Dual power supplies per server |
Site considerations:
| Site Type | Description | Recovery Time | Cost |
|---|---|---|---|
| Hot site | Fully operational duplicate | Minutes to hours | $$ Highest |
| Warm site | Partial infrastructure, needs data | Hours to days | $ Medium |
| Cold site | Empty facility, needs everything | Days to weeks | $ Lowest |
Platform diversity ā using different vendors and technologies reduces the risk that a single vulnerability affects all systems. If all servers run the same OS, one exploit compromises everything.
Multi-cloud systems ā distributing workloads across multiple cloud providers prevents vendor lock-in and reduces impact of a single provider outage.
Geographic dispersion ā placing redundant systems in different physical locations protects against regional disasters. If primary and backup systems are in the same building ā or even the same city ā a single earthquake, flood, or power grid failure can take both down simultaneously. Best practice is recovery sites in a different region or availability zone from the primary.
ā ļø Exam Trap: Hot site = fastest recovery, highest cost. Cold site = slowest recovery, lowest cost. The exam tests whether you can match recovery requirements (RTO) to the appropriate site type.
