Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.1.1. Standard Metrics, Custom Metrics, and Namespaces

šŸ’” First Principle: Metrics are organized by namespace, and the namespace tells you who published the data. AWS services publish into their own namespaces (AWS/EC2, AWS/RDS, AWS/Lambda). Your application publishes into a custom namespace you define. Understanding namespaces is the first step to finding and querying any metric in CloudWatch.

Standard Metrics are automatically published by AWS services at no additional cost. However, there's a critical gap: EC2 publishes CPU, network, and disk I/O by default, but it does not publish memory utilization or disk space utilization. Why? Because AWS runs the hypervisor, not your operating system. Memory is managed inside the OS, which AWS can't see without an agent.

EC2 MetricPublished By Default?Why / Why Not
CPU Utilizationāœ… YesHypervisor can measure this
Network In/Outāœ… YesHypervisor can measure this
Disk Read/Write Opsāœ… YesFor instance store; EBS is separate
Memory UtilizationāŒ NoInside the OS; requires CloudWatch agent
Disk Space UsedāŒ NoInside the OS; requires CloudWatch agent

Metric Resolution: By default, EC2 publishes metrics at 5-minute intervals (basic monitoring). You can enable detailed monitoring for 1-minute intervals — this costs extra and is required if you want faster Auto Scaling reactions.

Custom Metrics are published by your own code using the PutMetricData API. Examples: number of items in a processing queue, user login failures, cache hit rate. You define the namespace, metric name, unit, and value. Custom metrics are billed per metric per month.

High-Resolution Custom Metrics can be published at 1-second intervals (vs. the standard 1-minute). These are useful for high-frequency monitoring like Lambda invocation latency or API gateway response times.

Metric Statistics: When you query a metric over a period, you choose a statistic:

StatisticUse CaseExample
AverageTypical utilizationAverage CPU over 5 minutes
SumTotalsTotal number of requests
MaximumPeak detectionHighest latency spike
MinimumLow-water markLowest available memory
SampleCountCount of data pointsNumber of API calls
p99, p95, p50Latency percentiles99th percentile response time

āš ļø Exam Trap: For latency monitoring, the exam expects you to know that Average is misleading — it hides tail latency. A p99 of 5 seconds means 1% of users wait 5+ seconds even if average is 200ms. The correct statistic for SLA monitoring is a percentile.

Reflection Question: Your application publishes custom metrics at 1-minute resolution. A new requirement asks you to detect anomalies within 10 seconds. What metric configuration change is needed?

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications