3.3.3. Rollback Strategies

First Principle: Robust rollback mechanisms provide a clear, tested path to revert an application to a previous stable state, rapidly minimizing the impact of failed or problematic deployments.

For SysOps Administrators, a well-defined and executable rollback strategy is crucial for maintaining application availability and minimizing the impact of unforeseen issues after a deployment.

Key Concepts of Rollbacks for Application Deployments:
  • Purpose: To quickly revert an application to a known good state after a faulty or problematic deployment. This minimizes the duration of an outage or the exposure to a bug, minimizing business impact and maintaining user experience.
  • Automated vs. Manual: Rollbacks can be automated (e.g., triggered by CloudWatch Alarms for AWS CodeDeploy) or initiated manually.
  • Immutable Infrastructure Simplifies Rollbacks:
    • Concept: Instead of updating existing resources, new instances/containers with the new code are deployed. If a rollback is needed, traffic is simply switched back to the previously running, unchanged environment.
    • AWS Services: AWS CodeDeploy's Blue/Green deployments inherently support this by keeping the old environment ready. Lambda versions and aliases also facilitate quick rollbacks.
  • Data Rollbacks: Complex for databases. Often involves restoring from a snapshot or using Change Data Capture (CDC) to revert. Not always feasible without data loss for continuous writes.
  • Testing Rollbacks: Just as you test deployments, you must test rollback procedures regularly in non-production environments to ensure they work as expected under pressure.

Scenario: You've deployed a new version of your application using AWS CodeDeploy via a Blue/Green strategy. After shifting traffic, CloudWatch Alarms indicate a critical increase in errors in the new "Green" environment. You need to revert to the previous stable "Blue" version immediately.

Reflection Question: How do robust rollback mechanisms, particularly when enabled by immutable infrastructure and AWS CodeDeploy's Blue/Green deployments, fundamentally provide a clear and tested path to revert an application to a previous stable state, rapidly minimizing the impact of failed deployments?