3.2.1.1. CloudWatch Alarms for Application Events
3.2.1.1. CloudWatch Alarms for Application Events
First Principle: CloudWatch Alarms enable proactive detection of application anomalies, triggering rapid notifications and automated responses to prevent minor issues from escalating.
For developers, setting up CloudWatch Alarms is essential for knowing immediately when something goes wrong with their application. Alarms automatically monitor specified metrics and trigger actions when a threshold is breached.
- Metric Selection: Monitor standard application-related metrics (e.g., API Gateway 5xx errors, Lambda error count, DynamoDB throttled requests) or custom metrics from your application code (e.g., failed transactions, application latency).
- Thresholds: Define the value that, when crossed, puts the alarm into an
ALARMstate (e.g., "Lambda Error Count > 0 for 5 minutes"). - Evaluation Period: Specify the duration and number of data points over which the metric must breach the threshold before the alarm triggers, preventing false alarms from transient spikes.
- Actions: Configure what happens when the alarm state changes:
- Amazon SNS Notifications: Send alerts to email, SMS, or other endpoints for human intervention.
- AWS Lambda Functions: Invoke custom logic for automated remediation (e.g., restart a service, automatically scale resources).
- Auto Scaling Actions: Automatically add or remove instances based on load.
Scenario: You've deployed a Lambda function that serves an API endpoint. You need to be immediately notified via email if the function starts returning errors, and you want to log details about such errors for later debugging.
ā ļø Exam Trap: CloudWatch Alarms evaluate metrics, not logs. To alert on log content (error patterns), you need a CloudWatch Logs metric filter that creates a custom metric, THEN an alarm on that metric. Two steps, not one.
