Copyright (c) 2025 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.3.2.5. Remediating a Non-Desired System State

First Principle: Systems automatically and efficiently restore to a desired state, preventing vulnerabilities, operational issues, and compliance violations.

A non-desired system state is any deviation of an AWS resource/configuration from its intended, compliant, or healthy baseline (e.g., misconfigured security group, unpatched EC2, compliance violation). This aligns with the principles of automation and continuous compliance.

Detection: Services like AWS Config continuously monitor configurations, flagging non-compliance. Amazon CloudWatch Alarms detect operational health issues, and security services identify threats.

Automated Remediation: For many issues, automation is key.

Manual Remediation: Some complex or critical issues may require human oversight or manual intervention, especially when automation could have unintended consequences or requires specific business approvals.

Key Aspects of Remediation:
  • Detection: AWS Config, CloudWatch Alarms, Security services.
  • Automated Remediation: Systems Manager Automation, Lambda, Config auto-remediation.
  • Manual Remediation: For complex/critical issues.

Scenario: A DevOps team discovers that a new EC2 instance was launched without the required security agents installed, creating a security vulnerability (a non-desired system state). They need to automatically detect and remediate this.

Reflection Question: How would you use AWS Config to detect this non-compliant EC2 instance and then trigger an automated remediation action using AWS Systems Manager Automation documents (or a Lambda function) to restore the instance to its desired state by installing the missing agents?

šŸ’” Tip: When designing remediation strategies, consider the balance between full automation for routine, low-impact issues and requiring human approval for critical actions that could disrupt services or data.