3.2.5. Root Cause Analysis
First Principle: Root cause analysis transforms a single incident into a systemic improvement. Without understanding how the attacker gained initial access, you're guaranteed to face the same attack again.
Amazon Detective is purpose-built for root cause investigation:
- Automatically collects and correlates CloudTrail, VPC Flow Logs, and GuardDuty findings
- Builds investigation graphs showing relationships between entities (users, IPs, resources)
- Visualizes resource behavior over time to identify deviations from baseline
- Answers "when did the compromise begin, what did the attacker do, and how did they get in?"
Root Cause Analysis Framework:
| Phase | Questions | AWS Data Source |
|---|---|---|
| Initial access | How did the attacker get in? (phished credentials, exposed key, vulnerable service) | CloudTrail: first anomalous API call |
| Persistence | How did they maintain access? (new users, roles, keys) | CloudTrail: CreateUser, CreateAccessKey |
| Lateral movement | How did they expand access? (role assumption, cross-account) | CloudTrail: AssumeRole across accounts |
| Impact | What did they do? (data access, resource creation, exfiltration) | CloudTrail data events + Flow Logs |
| Timeline | When did each phase occur? | Detective investigation graph |
Post-Incident Improvements:
- Document findings in a post-incident review
- Update runbooks based on lessons learned
- Implement new preventive controls for the root cause vulnerability
- Add new detective controls for the attack patterns observed
- Share findings across the organization (sanitized) to prevent recurrence
⚠️ Exam Trap: Detective is for root cause investigation AFTER an incident is detected. It doesn't detect threats (that's GuardDuty). Don't confuse their roles: GuardDuty detects, Detective investigates.
Scenario: After containing a breach, Detective reveals that the root cause was an EC2 instance running with an overly permissive IAM role. The attacker compromised the instance through an unpatched web application, then used the role's S3 permissions to exfiltrate data. Post-incident improvements: implement least-privilege roles, enable Inspector for vulnerability scanning, and add GuardDuty S3 protection.
Reflection Question: How does root cause analysis change your security posture from "responding to individual incidents" to "systematically reducing your attack surface"?