AWS-DVA-C02 & AWS CERTIFICATION | Rollbacks for Application Deployments - AWS Certified Developer

3.3.2. Rollbacks for Application Deployments

First Principle: Robust rollback mechanisms provide a clear, tested path to revert an application to a previous stable state, rapidly minimizing the impact of failed or problematic deployments.

For developers, anticipating and planning for rollbacks is as important as the deployment itself. Not every new feature or bug fix will work perfectly in production.

Key Concepts of Rollbacks for Application Deployments:

Purpose: To quickly revert an application to a known good state after a faulty or problematic deployment.
Minimizing Impact: A swift rollback reduces the duration of an outage or the exposure to a bug, minimizing business impact and maintaining user experience.
Automated vs. Manual: Rollbacks can be automated (e.g., triggered by CloudWatch Alarms for AWS CodeDeploy) or initiated manually.
Immutable Infrastructure Simplifies Rollbacks:
- Concept: Instead of updating existing resources, new instances/containers with the new code are deployed. If a rollback is needed, traffic is simply switched back to the previously running, unchanged environment.
- AWS Services: AWS CodeDeploy's Blue/Green deployments inherently support this by keeping the old environment ready. Lambda versions and aliases also facilitate quick rollbacks.
Testing Rollbacks: Just as you test deployments, you must test rollback procedures regularly in non-production environments to ensure they work as expected under pressure.

Scenario: You've deployed a new version of your application, and after a few minutes, CloudWatch Alarms indicate a critical increase in errors. You need to revert to the previous stable version immediately.

Reflection Question: How do robust rollback mechanisms, especially when enabled by immutable infrastructure and services like AWS CodeDeploy's Blue/Green deployments or Lambda versions, fundamentally provide a clear and tested path to revert an application to a previous stable state, rapidly minimizing the impact of problematic deployments?