2.1.3. Metrics, Alerts, and Dashboards for Anomaly Detection
First Principle: Detection without alerting is just logging. The value of monitoring comes from surfacing anomalies to the right people (or automated systems) fast enough to act before damage escalates.
Amazon GuardDuty provides intelligent threat detection by continuously analyzing:
- CloudTrail management events — unusual API calls, unauthorized Region usage
- CloudTrail S3 data events — suspicious data access patterns
- VPC Flow Logs — communication with known malicious IPs, cryptocurrency mining
- DNS logs — domain generation algorithm (DGA) detection, C2 communication
- EKS/ECS/Lambda runtime — container and serverless threat detection
GuardDuty uses machine learning to establish behavioral baselines and generates findings with severity levels (Low, Medium, High, Critical).
Amazon Macie specifically detects sensitive data exposure:
- Scans S3 buckets for PII, PHI, financial data, credentials
- Alerts on buckets with public access containing sensitive data
- Integrates with Security Hub for centralized visibility
CloudWatch Alarms and Dashboards:
- Metric alarms: trigger on threshold breaches (CPU > 90%, error rate > 5%)
- Anomaly detection: ML-based bands that alert on statistical anomalies without hard-coded thresholds
- Composite alarms: combine multiple alarms to reduce false positives
- Dashboards: visual aggregation of metrics for real-time operational awareness
Alert Routing:
EventBridge is the routing engine that connects detection to response — matching finding patterns and dispatching to the appropriate handler.
⚠️ Exam Trap: GuardDuty generates findings but doesn't remediate. EventBridge + Lambda (or Step Functions) handle automated response. If a question asks about "automatically responding to a GuardDuty finding," the answer involves EventBridge, not GuardDuty alone.
Scenario: GuardDuty detects cryptocurrency mining on an EC2 instance (finding type: CryptoCurrency:EC2/BitcoinTool.B!DNS). An EventBridge rule matches this finding type and triggers a Lambda function that isolates the instance by applying a quarantine security group.
Reflection Question: Why does the exam emphasize EventBridge as the bridge between detection and response, rather than having security services trigger remediation directly?