6.3.4. Hybrid Connectivity Troubleshooting
š” First Principle: Hybrid connectivity failures are almost always one of three things: routing (traffic doesn't know the path), filtering (something blocks the path), or BGP (routes aren't being exchanged as expected). The diagnostic approach is the same as any network problem ā work from Layer 1 up, or in AWS terms, from the VPN/DX status ā BGP routes ā route tables ā security groups.
Site-to-Site VPN Troubleshooting Checklist:
| Check | What to Look For |
|---|---|
| Tunnel status | Both tunnels should be UP in the VPN console |
| BGP session | If using BGP, check that BGP is ESTABLISHED |
| Advertised routes | On-premises router is advertising the correct prefixes |
| Route table | VPC route table has routes for on-premises CIDR pointing to VGW |
| Security groups | Instances have rules allowing traffic from on-premises CIDR |
| Customer gateway config | On-premises device config matches AWS-generated configuration |
Common VPN Issues:
| Problem | Root Cause |
|---|---|
| Tunnel is DOWN | IKE phase 1 or 2 failure; mismatched encryption settings |
| Tunnel is UP but no traffic | Routing problem: routes not advertised or not in VPC route table |
| One tunnel DOWN, one UP | Normal for asymmetric routing; configure failover on on-premises device |
| Traffic gets through in one direction only | Asymmetric routing or security group issue |
PrivateLink Troubleshooting:
When a VPC endpoint service (PrivateLink) doesn't work:
- Endpoint service state must be "Available"
- The consumer's connection request must be accepted (or auto-accept enabled)
- DNS resolution ā private DNS for the endpoint must resolve to the ENI IP
- Security groups on the endpoint ENI must allow traffic from consumer
Transit Gateway Troubleshooting:
- Check attachment state (attached VPC/VPN must show as "Available")
- Check Transit Gateway route table ā routes must exist for each destination
- Check route table associations ā VPC must be associated with the correct TGW route table
- Check route propagation ā VPN/DX gateway must propagate routes into the TGW route table
ā ļø Exam Trap: A Site-to-Site VPN with two tunnels shows "one tunnel UP, one tunnel DOWN" ā this is not a problem by itself. AWS VPN connections always have two tunnels, and AWS periodically performs maintenance on one tunnel. Traffic fails over to the second tunnel automatically. You only have a problem if both tunnels are DOWN. Configure your on-premises device to use both tunnels in active-active or active-passive mode.
Reflection Question: An application in a private VPC subnet cannot reach an on-premises database server. The Site-to-Site VPN shows both tunnels as UP. VPC flow logs show no traffic to the on-premises IP. What is the most likely cause, and what is the first configuration item to check?