2.3.2. Remediating Logging Misconfigurations
First Principle: Finding a misconfiguration is only half the job — the remediation must fix the immediate problem AND prevent recurrence, ideally through automation that detects and corrects the same misconfiguration across all accounts.
CloudWatch Agent Troubleshooting:
- Agent not sending data: Check agent status (
amazon-cloudwatch-agent-ctl -a status), verify configuration file format, check IAM instance profile permissions - Metrics missing: Verify metric namespace and dimensions in configuration, check for typos in metric names
- High CPU from agent: Reduce collection interval, filter collected metrics, check for log file rotation issues
Missing Logs Remediation Pattern:
Preventing Recurrence:
- Deploy AWS Config rules that detect disabled logging (e.g.,
cloud-trail-log-file-validation-enabled,vpc-flow-logs-enabled) - Use CloudFormation Guard to validate IaC templates include logging configuration
- Create SCPs that prevent disabling logging in production accounts
- Set up CloudWatch alarms on log delivery metrics (e.g., alarm if CloudTrail log delivery stops)
⚠️ Exam Trap: The exam prefers solutions that prevent recurrence over one-time fixes. If a question describes a logging misconfiguration, the best answer includes both immediate remediation AND a preventive control (Config rule, SCP, or automation).
Scenario: VPC Flow Logs are disabled in 3 of your 50 accounts. You remediate immediately by enabling them, then deploy a Config rule (vpc-flow-logs-enabled) with auto-remediation across all accounts to prevent recurrence.
Reflection Question: Why is "fix and prevent" a better answer than "just fix" on the exam, and how does this reflect AWS's operational philosophy?