5.1.2.3. KQL, VM Insights, and Container Insights
5.1.2.3. KQL, VM Insights, and Container Insights
Alerts tell you something is wrong; KQL queries and specialized Insights services help you understand why.
KQL (Kusto Query Language) is the query language for Azure Monitor Logs, Log Analytics, and Application Insights. Its pipe-based syntax chains operators: requests | where success == false | summarize count() by bin(timestamp, 1h) shows hourly failure trends. Azure VM Insights provides comprehensive VM monitoring — performance counters, dependency mapping, and health state — without installing additional agents beyond the Azure Monitor agent. Container Insights extends this to AKS, collecting container logs, node metrics, and pod-level performance data. The key diagnostic workflow: start with high-level dashboards (are things healthy?), drill into alerts (what changed?), query logs for root cause (KQL analysis), and trace individual transactions (distributed tracing). Workbooks combine KQL queries with interactive visualizations for reusable diagnostic workflows that the team can share.
Content for Analysis Tools - see flashcards and questions for this subsection.
KQL is a read-only query language optimized for exploring large datasets. Its pipe syntax chains operations: source | filter | transform | aggregate | sort | limit. Common diagnostic patterns include: time-series analysis (summarize count() by bin(timestamp, 5m)), percentile calculations (percentile(duration, 95)), and cross-table joins (requests | join exceptions on operation_Id).
Azure VM Insights collects performance data (CPU, memory, disk, network) and dependency mapping (which VMs communicate with which services) using the Azure Monitor agent. The Map feature visualizes server-to-server dependencies automatically, revealing undocumented connections that infrastructure documentation missed.
Container Insights monitors AKS clusters at multiple levels: cluster-level (node count, CPU/memory allocation), node-level (individual node health, disk pressure), pod-level (restart counts, OOMKilled events), and container-level (stdout/stderr logs). Prometheus metrics integration allows custom application metrics alongside infrastructure telemetry.
Workbooks combine KQL queries with interactive visualizations — parameter dropdowns, time range selectors, conditional formatting — into reusable diagnostic tools. A "Production Health" workbook might combine: error rate trend (line chart), top failing operations (table), deployment markers (vertical lines on timeline), and resource utilization (multi-line chart). Teams share workbooks through Azure Monitor, creating institutional diagnostic knowledge that survives team turnover.
Log-based alerts use KQL queries evaluated on a schedule: requests | where resultCode >= 500 | summarize failures = count() by bin(timestamp, 5m) | where failures > 10 fires when 5-minute failure count exceeds 10. Action groups define response: email notification, SMS, webhook to PagerDuty, or Azure Function for automated remediation.
Diagnostic query patterns in KQL follow a systematic investigation flow. Start broad: requests | where timestamp > ago(1h) | summarize count(), avg(duration) by resultCode. Narrow to failures: exceptions | where timestamp > ago(1h) | summarize count() by type, method. Correlate across tables: requests | where success == false | join (exceptions) on operation_Id. This structured approach prevents the common mistake of jumping into detailed analysis before understanding the scope of the problem.
Workbook time range parameters synchronize all charts to the same investigation window. When an on-call engineer sets the time range to "last 30 minutes around the alert," all visualizations update simultaneously, creating a coherent diagnostic view.