3.1.3. Testing and Validating IR Plans
First Principle: An untested IR plan is an assumption, not a capability. Testing reveals gaps in procedures, tool configurations, and team coordination before a real incident exposes them catastrophically.
AWS Fault Injection Service (FIS) (new in C03) enables controlled chaos engineering:
- Inject failures into AWS resources (terminate instances, throttle APIs, disrupt network)
- Test whether your automated responses trigger correctly
- Validate that blast radius minimization works as designed
- Run experiments safely with stop conditions that prevent cascading failures
AWS Resilience Hub (new in C03) assesses application resilience:
- Define resilience targets (RTO/RPO) for your applications
- Automatically assess whether your architecture meets those targets
- Identify single points of failure in your security architecture
- Recommend improvements based on gaps between targets and actual resilience
Testing Methods:
| Method | Description | When to Use |
|---|---|---|
| Tabletop exercise | Walk through scenarios verbally with the team | Quarterly, for plan validation |
| Simulation | Create realistic scenarios in a test account | Semi-annually, for tool validation |
| FIS experiment | Inject real failures in controlled conditions | Monthly, for automation validation |
| Resilience Hub assessment | Automated architecture evaluation | After any infrastructure change |
| Game day | Full-scale, unannounced exercise | Annually, for organizational readiness |
⚠️ Exam Trap: FIS is for testing resilience through fault injection (chaos engineering). It's NOT a security scanning tool. Don't confuse it with Inspector (vulnerability scanning) or GuardDuty (threat detection).
Scenario: Your automated IR workflow should isolate a compromised EC2 instance within 60 seconds of a GuardDuty finding. You use FIS to simulate a GuardDuty finding and measure end-to-end response time, discovering that the Lambda function has a cold start delay of 45 seconds that pushes total response to 90 seconds.
Reflection Question: How does FIS enable you to validate automated response time without waiting for (or causing) a real security incident?