Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.3.3. Lambda-Based Remediation Patterns

šŸ’” First Principle: When remediation logic is too complex for a predefined runbook but too specialized to justify building a full service, Lambda fills the gap. Lambda lets you write custom remediation code in minutes without managing infrastructure, triggered automatically by CloudWatch, EventBridge, Config, or SNS.

Lambda's role in operational automation is custom logic execution. It's not a replacement for Systems Manager Automation (which has richer workflow capabilities) or for EventBridge (which handles routing). Lambda is the "do arbitrary code" step in a remediation pipeline.

Common Lambda Remediation Patterns:
PatternTriggerLambda Action
Tag enforcementConfig rule violationAdd missing required tags to resources
Security group remediationEventBridge (Config change)Remove overly permissive inbound rules
Unused resource cleanupEventBridge (scheduled)Stop idle EC2 instances, delete unattached EBS volumes
SNS message processingSNS subscriptionParse alert, create JIRA ticket, send Slack message
CloudWatch alarm responseCloudWatch alarm → SNS → LambdaRestart application, flush cache, scale resource

Lambda Execution Role: The Lambda function needs an IAM execution role with exactly the permissions required for its remediation task. Follow least privilege:

  • If Lambda remediates EC2, grant only ec2:StopInstances, ec2:StartInstances
  • Don't grant AdministratorAccess to a Lambda function — that's a security risk waiting to become an incident

Idempotency: Good remediation functions are idempotent — running them twice has the same result as running them once. This matters because event-driven systems can deliver duplicate events. If your Lambda is "delete unattached volumes," make sure it doesn't fail if the volume was already deleted.

āš ļø Exam Trap: Lambda has a maximum execution timeout of 15 minutes. If a remediation task takes longer (e.g., a large data migration), Lambda is the wrong choice. Use Step Functions to orchestrate long-running workflows, or Systems Manager Automation with a long timeout.

Reflection Question: Config detects that an EC2 security group has inbound port 22 (SSH) open to 0.0.0.0/0. Design a remediation that: (1) removes the rule, (2) creates a CloudWatch event, and (3) notifies the security team. Which services does each step use?

Alvin Varughese
Written byAlvin Varughese
Founder•15 professional certifications