Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.
3.1.3.3. Recovery Procedures
3.1.3.3. Recovery Procedures
A DR plan without documented, tested procedures is a hope, not a strategy. Recovery procedures must be automated, versioned, and rehearsed.
Automated recovery playbook (SSM Automation):
schemaVersion: '0.3'
description: 'DR Failover to us-west-2'
mainSteps:
- name: PromoteRDSReplica
action: aws:executeAwsApi
inputs:
Service: rds
Api: PromoteReadReplica
DBInstanceIdentifier: dr-replica
- name: WaitForDBAvailable
action: aws:waitForAwsResourceProperty
inputs:
Service: rds
Api: DescribeDBInstances
DBInstanceIdentifier: dr-replica
PropertySelector: '$.DBInstances[0].DBInstanceStatus'
DesiredValues: ['available']
- name: ScaleUpASG
action: aws:executeAwsApi
inputs:
Service: autoscaling
Api: UpdateAutoScalingGroup
AutoScalingGroupName: dr-web-asg
MinSize: 4
DesiredCapacity: 8
- name: UpdateDNS
action: aws:executeAwsApi
inputs:
Service: route53
Api: ChangeResourceRecordSets
# Switch to DR region endpoint
Recovery procedure elements:
- Detection: CloudWatch alarms, Route 53 health checks, or AWS Health events trigger the procedure
- Decision: Automated (Route 53 failover) or manual approval (SSM Automation approval step)
- Execution: Promote DB, scale compute, update routing
- Validation: Synthetic tests confirm the DR environment is serving traffic correctly
- Communication: SNS notifications to operations team at each step
Exam Trap: Automated failover can cause "split-brain" if the primary region recovers while the DR region is active. Both regions may accept writes, causing data conflicts. To prevent this, use Route 53 health check failover with sufficient evaluation periods (3+ failed checks) and always have a clear failback procedure that resolves any data divergence.

Written byAlvin Varughese•Founder•15 professional certifications