3.1.3.6. Configuring a Load Balancer to Recover from Backend Failure
3.1.3.6. Configuring a Load Balancer to Recover from Backend Failure
Load balancers are the front door for self-healing — they detect backend failures and route traffic to healthy targets automatically. Getting health check configuration right is the difference between seamless recovery and cascading failures.
Health check tuning:
| Parameter | Conservative | Aggressive | When to Use |
|---|---|---|---|
| Interval | 30s | 10s | Aggressive for critical APIs |
| Healthy threshold | 5 | 2 | Low threshold for fast recovery |
| Unhealthy threshold | 2 | 2 | Keep at 2 to avoid false positives |
| Timeout | 5s | 3s | Lower timeout detects stuck processes |
| Path | /health | /health | Always use a deep health check endpoint |
Deep health checks verify more than "the process is running." A good /health endpoint checks database connectivity, cache availability, disk space, and downstream service reachability. Return 200 only when all dependencies are healthy.
Recovery patterns:
- Target deregistration: Unhealthy target removed from rotation. New requests go to healthy targets.
- Connection draining: In-flight requests complete before target is fully removed (configurable: 0-3600s).
- ASG replacement: If ASG health check type =
ELB, unhealthy instances are terminated and replaced automatically. - Cross-AZ failover: If all targets in one AZ fail, ALB routes all traffic to remaining AZs.
Route 53 health checks for regional failover:
# Health check that monitors the ALB endpoint
aws route53 create-health-check --caller-reference $(date +%s) \
--health-check-config '{
"FullyQualifiedDomainName": "api.example.com",
"Port": 443,
"Type": "HTTPS",
"RequestInterval": 10,
"FailureThreshold": 3,
"EnableSNI": true
}'
Exam Trap: Route 53 health checks and ALB health checks serve different purposes. ALB health checks manage individual target health within a region. Route 53 health checks manage regional endpoint health for DNS failover. If the exam asks about routing traffic away from an entire region, the answer is Route 53 health checks + failover routing, not ALB health checks.
