6.2.2. Route 53 Advanced Routing and Health Checks
š” First Principle: Route 53 routing policies are DNS-layer traffic management ā they control which IP address a DNS query returns based on the health, location, latency, or weight you configure. This is global load balancing implemented at the DNS layer, without a load balancer, and it can span multiple regions, multiple cloud providers, and on-premises infrastructure simultaneously.
Health Check Integration:
Route 53 health checks monitor endpoints independently from your in-region health checks. They check from multiple AWS locations globally ā so even if your ALB's health checks pass (because they're in the same region as the failure), Route 53 can detect a regional issue from external vantage points.
Health checks can monitor:
- Endpoints: HTTP/HTTPS/TCP to a specific IP or domain
- Other health checks: Calculated health checks (AND/OR logic across multiple checks)
- CloudWatch alarms: Health based on any metric you can alarm on
Routing Policy Summary (Full Decision Matrix):
| Policy | Metric Used | DNS Returns | Use Case |
|---|---|---|---|
| Simple | None | Single value | One resource, no health check |
| Weighted | Weight % | One value (probabilistic) | Canary, A/B testing, migration |
| Latency | AWS-measured latency per region | Lowest-latency region record | Speed optimization |
| Failover | Health check pass/fail | Primary if healthy, else secondary | Active-passive DR |
| Geolocation | Client geographic location | Region/country-specific record | Compliance, localization |
| Geoproximity | Distance + bias | Nearest resource (adjustable) | Fine-grained geographic shifting |
| Multivalue | Health check (optional) | Up to 8 healthy records | Client-side balancing |
| IP-based | Client IP CIDR range | CIDR-matched record | ISP routing, network segmentation |
TTL Strategy:
| Situation | Recommended TTL |
|---|---|
| Normal operation | 300 seconds (5 min) ā balances caching efficiency vs. update speed |
| Before a planned failover/migration | 60 seconds ā reduce caching so DNS changes propagate quickly |
| Stable records (MX, NS) | 86400 seconds (24 hr) ā rarely change |
| During an active incident with DNS failover | 60 seconds or lower |
Alias Records vs. CNAME:
| | Alias Record | CNAME |
|:|:-----------:|:-----:|
| Zone apex | ā
Works at example.com | ā Not allowed at zone apex |
| Cost | Free (no charge per query) | Standard query cost |
| Health check integration | ā
Yes | ā Limited |
| Points to | AWS resources only | Any hostname |
ā ļø Exam Trap: Route 53 Failover routing requires a health check on the primary record. If you don't configure a health check, Route 53 always returns the primary record ā even if the primary resource is completely down. The failover policy itself doesn't detect failures; health checks do the detection. This is the most common misconfiguration in DNS failover setups.
Reflection Question: A global SaaS application serves European users from eu-west-1 and US users from us-east-1. European users must always be served from EU infrastructure for GDPR compliance, but if the EU region fails, traffic should fail over to the US region. Which combination of Route 53 routing policies achieves both requirements?