2.2.3. Log Storage, Data Lakes, and Third-Party Integration
First Principle: Raw logs scattered across services and accounts are nearly useless during an incident. Centralized, normalized storage transforms raw data into actionable intelligence that can be queried, correlated, and shared with security tools.
Amazon Security Lake is purpose-built for this:
- Automatically collects logs from CloudTrail, VPC Flow Logs, Route 53 Resolver, Security Hub, and Lambda
- Normalizes all data into OCSF (Open Cybersecurity Schema Framework) — an open standard for security data
- Stores in S3 using Apache Iceberg tables for efficient querying and partitioning
- Supports subscriber access for third-party SIEM/SOAR tools (Splunk, CrowdStrike, etc.)
- Manages lifecycle policies to transition data to cheaper storage tiers automatically
Why OCSF Matters:
Without normalization, a CloudTrail event, a VPC Flow Log record, and a GuardDuty finding have completely different schemas. OCSF provides a common data model so you can query across all sources with one syntax. This is critical for:
- Correlating network traffic (Flow Logs) with API calls (CloudTrail) during investigations
- Feeding consistent data to ML-based threat detection tools
- Sharing security data with partners and vendors in a standardized format
Integration Patterns:
| Integration Target | Mechanism | Use Case |
|---|---|---|
| Third-party SIEM | Security Lake subscriber | Centralized SOC operations |
| Custom analytics | Athena queries on Security Lake | Ad-hoc investigation |
| Real-time streaming | Kinesis Data Firehose | Low-latency analysis |
| Long-term compliance | S3 with lifecycle to Glacier | Regulatory retention |
⚠️ Exam Trap: Security Lake is a newer service — know that it uses OCSF (not ASFF). ASFF is Security Hub's finding format. OCSF is the broader log normalization standard. The exam may test whether you know which format applies where.
Scenario: A company's SOC team uses Splunk for security analysis. They need CloudTrail, VPC Flow Logs, and GuardDuty findings in Splunk. Rather than building custom integrations, they enable Security Lake with Splunk as a subscriber, receiving OCSF-normalized data automatically.
Reflection Question: How does the shift from proprietary log formats to OCSF reduce operational overhead for security teams that use multiple analysis tools?