Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.3.2. Systems Manager Automation Runbooks

💡 First Principle: The most reliable operational procedure is one that runs itself. Systems Manager Automation documents (formerly SSM Automation) let you encode multi-step operational procedures as executable code — the same procedure runs the same way every time, whether it's triggered manually, by an alarm, or by a Config rule.

Without runbooks, every incident response is improvised. Two engineers might fix the same problem in different ways, producing different system states. With automation documents, you have repeatable, auditable procedures that can be reviewed, tested, and improved.

SSM Document Types:
TypeFormatUse Case
AutomationYAML/JSONMulti-step workflows with branching logic
CommandYAML/JSONSingle commands on instances (Run Command)
SessionYAML/JSONInteractive shell sessions (Session Manager)
PolicyYAML/JSONCompliance and state management (State Manager)

AWS-Provided Runbooks (Predefined): AWS publishes hundreds of pre-built automation documents. Key examples for the exam:

DocumentWhat It Does
AWS-RestartEC2InstanceStop and start an EC2 instance
AWS-StopEC2InstanceStop a running instance
AWS-CreateImageCreate an AMI from a running instance
AWS-ApplyPatchBaselinePatch instances using Systems Manager Patch Manager
AWS-ConfigureS3BucketLoggingEnable S3 server access logging
AWS-DisablePublicAccessForSecurityGroupRemove overly permissive inbound rules

Custom Runbooks: You write custom automation documents when predefined ones don't meet your needs. The document defines:

  • Steps: Sequential actions (can include branching and error handling)
  • Actions: What to do at each step (invoke Lambda, run a script, call an API, approve manually)
  • Parameters: Variables that callers pass in (e.g., instance ID, region)

Rate Control: When running automation against multiple targets simultaneously, rate control prevents accidentally affecting all instances at once:

  • Concurrency: Max number of targets to automate simultaneously
  • Error threshold: Stop the automation if this many targets fail

Execution Methods: Automation can be triggered by:

  • Manual execution (console/CLI)
  • CloudWatch Alarms (via alarm action)
  • EventBridge rules (pattern-based)
  • Config remediation (auto-remediation of Config rule violations)
  • Maintenance Window (scheduled execution during approved windows)

⚠️ Exam Trap: Systems Manager Run Command executes commands on instances right now. Systems Manager Automation executes multi-step workflows that may involve multiple AWS services, not just instances. The exam distinguishes these: Run Command for "execute a script on this EC2 instance"; Automation for "stop this instance, take a backup, then restart it."

Reflection Question: A CloudWatch alarm fires indicating an EC2 instance's CPU has been above 95% for 10 minutes. Design an automation runbook that: (1) creates an AMI snapshot of the instance, (2) attempts an instance type resize, and (3) notifies the team if the resize fails. What Systems Manager features would you use?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications