2.3.1. Analyzing Resource Configuration and Permissions
First Principle: Most logging failures are caused by one of three things: missing IAM permissions, incorrect resource configuration, or disabled features. Systematic diagnosis starts by checking these three areas in order.
Common Logging Failures by Service:
| Service | Common Failure | Root Cause | Fix |
|---|---|---|---|
| Lambda → CloudWatch | No logs appear | Execution role lacks logs:CreateLogGroup, logs:PutLogEvents | Add CloudWatch Logs permissions to execution role |
| API Gateway | No access/execution logs | Logging not enabled in stage settings, or IAM role missing | Enable logging in stage, create API Gateway logging role |
| CloudFront | No access logs | Access logging disabled, or S3 bucket ACL doesn't grant CloudFront write | Enable logging, fix S3 bucket ACL |
| Health checks | False positives | Security group blocking health check traffic | Allow health check source IPs in security group |
| CloudTrail | Missing data events | Data events not enabled for the specific service | Enable data events in Trail configuration |
Systematic Troubleshooting Process:
- Verify the feature is enabled — Is logging turned on for the service?
- Check IAM permissions — Does the service role have write access to the log destination?
- Check the destination — Is the S3 bucket/CloudWatch log group accessible? Any resource policies blocking access?
- Check network path — Can the resource reach the logging endpoint? (VPC endpoints may be needed for private subnets)
- Check for encryption conflicts — If the destination uses KMS, does the source service have
kms:GenerateDataKeypermission?
⚠️ Exam Trap: A Lambda function in a VPC without a NAT Gateway or VPC endpoint for CloudWatch Logs will fail to send logs — silently. The function runs, returns results, but logs are lost because there's no route to the CloudWatch Logs endpoint.
Scenario: API Gateway access logs stopped appearing in CloudWatch 3 weeks ago. After investigation, you discover that someone rotated the API Gateway's logging IAM role, and the new role doesn't have logs:PutLogEvents permission.
Reflection Question: Why is IAM permission the first thing to check when a service stops sending logs, and how do you prevent recurrence?