2.2.1. Log Sources, Ingestion, and Storage
First Principle: Every security-relevant action in AWS generates a log somewhere. The challenge isn't generating logs — it's identifying which sources matter, ingesting them reliably, and storing them cost-effectively for the required retention period.
Primary AWS Log Sources:
| Log Source | What It Captures | Default Storage |
|---|---|---|
| CloudTrail | API calls (who did what, when) | S3 (90-day free history in console) |
| VPC Flow Logs | Network traffic metadata (IPs, ports, bytes) | CloudWatch Logs or S3 |
| CloudWatch Logs | Application logs, OS logs, service logs | CloudWatch Logs |
| S3 access logs | Bucket-level access records | S3 (separate bucket) |
| ELB access logs | Request-level load balancer logs | S3 |
| WAF logs | Web request details with rule match info | S3, CloudWatch Logs, or Kinesis |
| Route 53 Resolver logs | DNS query logs | CloudWatch Logs or S3 |
Storage Architecture Decisions:
- Hot storage (CloudWatch Logs): Real-time querying, higher cost per GB, automatic index
- Warm storage (S3 Standard): Queryable with Athena, moderate cost, requires setup
- Cold storage (S3 Glacier): Compliance archival, lowest cost, retrieval takes minutes-hours
- Analytics-optimized (Security Lake): OCSF-normalized, Iceberg tables, purpose-built for security
⚠️ Exam Trap: CloudTrail's free 90-day event history is viewable in the console but NOT suitable for compliance. For compliance and forensics, you must create a Trail that delivers logs to S3 with immutability protections (Object Lock, bucket policy preventing deletion).
Scenario: A healthcare company needs to retain CloudTrail logs for 7 years (HIPAA requirement) while also enabling real-time analysis for incident response. You design a dual-destination Trail: S3 with Glacier lifecycle for long-term retention, and CloudWatch Logs for real-time querying.
Reflection Question: Why does a well-designed logging architecture use multiple storage tiers rather than a single destination?