3.2.2. AWS X-Ray for Distributed Tracing
First Principle: AWS X-Ray provides end-to-end visibility into requests as they traverse distributed applications, enabling developers to precisely identify performance bottlenecks and errors across multiple services.
Loading diagram...
In this X-Ray trace, it's immediately clear the external API call (2500ms) is the bottleneck ā not DynamoDB or S3. Without tracing, you'd be guessing.
In modern, distributed applications (especially those using microservices or serverless architectures), a single user request might traverse many different services and components. AWS X-Ray helps developers understand how their application and its underlying services are performing.
- Distributed Tracing: Collects data about requests that your application serves and traces them as they flow through various components.
- Service Map: Visualizes the components of your application and their connections, showing latency and error rates for each. This helps identify unhealthy services or bottlenecks.
- Segment Timelines: Provides a detailed breakdown of what each service or component is doing within a trace, showing execution time for each step.
- Integration: Easily integrate with AWS Lambda, Amazon API Gateway, Amazon EC2, Amazon ECS, AWS Elastic Beanstalk, and AWS SDKs (for custom instrumentation).
- Annotation & Metadata: Developers can add custom annotations and metadata to traces for more context during debugging.
Scenario: You're developing a microservices application where a single user request passes through API Gateway, then a Lambda function, and finally interacts with a DynamoDB table. Users report intermittent delays, and you need to pinpoint where the latency is occurring.
CloudWatch vs. X-Ray: When to Use Which
| Question | Use CloudWatch | Use X-Ray |
|---|---|---|
| "What happened?" | ā Logs show errors, output | ā |
| "How is it performing?" | ā Metrics show duration, errors, throttles | ā |
| "Alert me when..." | ā Alarms on metrics/log patterns | ā |
| "Where is the bottleneck?" | ā | ā Service map + trace analysis |
| "Which downstream call failed?" | ā | ā Segments and subsegments |
| "How do services interact?" | ā | ā Service map visualization |
ā ļø Exam Trap: X-Ray requires both the X-Ray SDK in your code AND the X-Ray daemon running on the host (or enabled on the Lambda service). Missing either one means no traces. On Lambda, you enable "active tracing" ā you don't run a daemon.
