3.1.4.4. Hotfix Pipelines, Rollback, and Deployment Resilience
3.1.4.4. Hotfix Pipelines, Rollback, and Deployment Resilience
Normal deployments follow the full pipeline; hotfixes and failure scenarios require expedited paths that maintain safety without the full ceremony.
Deployment resilience covers the patterns that handle real-world deployment failures. Hotfix pipelines trade testing breadth for speed — running critical unit tests and security scans (~5 minutes) while skipping full regression and staging soak. The key principle: hotfix ≠ skip everything; it means reduced-but-nonzero safety gates with auditable single-approver authorization. Deployment ordering follows dependency topology: database schema first (expand phase), backend API second (new endpoints available), frontend last (can now call new endpoints). Reversing this order causes 100% failure for affected features. When a multi-region deployment fails at Region 3 due to throttling, retry with exponential backoff rather than rolling back healthy Regions 1-2 — transient failures don't warrant discarding successful work. VIP swaps sever WebSocket connections; clients must implement automatic reconnection with backoff.
Content for A/B Testing in Deployment Pipelines - see flashcards and questions for this subsection.
Hotfix workflows balance urgency with safety. The pipeline skips full regression and staging soak but retains: critical unit tests (2 minutes), security scanning (1 minute), and single-approver authorization (audit trail). Post-deployment, the team backfills the full test suite and verifies the hotfix doesn't regress other functionality.
Dependency-aware deployment ordering prevents cascading failures in microservice architectures. Map service dependencies as a directed acyclic graph (DAG): if Frontend → API → Database, deploy bottom-up. Database schema migrations use the expand-contract pattern: first deploy the schema expansion (new columns, tables) alongside the old version, then deploy the application code that uses the new schema, finally contract (remove old columns) after all consumers have migrated.
Multi-region deployment patterns handle geographic distribution. Sequential region deployment with health gates: deploy to Region 1 → validate for 10 minutes → deploy to Region 2 → validate → continue. If Region 3 fails due to throttling (a transient error), retry with exponential backoff rather than rolling back healthy regions. Only roll back if the new version itself is causing errors.
Connection draining before VIP swap or load balancer changes ensures in-flight requests complete before routing shifts. Azure App Service supports connection draining with a configurable timeout. For WebSocket applications, implement client-side reconnection with exponential backoff because VIP swaps sever all active TCP connections instantly.
The expand-contract pattern for database migrations ensures zero-downtime schema changes. Expand phase: add new columns or tables while keeping old schema intact — both old and new application versions work. Contract phase: after all application instances use the new schema, remove deprecated columns. This two-phase approach prevents the "deploy database first, break the app" failure mode that single-step migrations cause.
Deployment health checks use readiness and liveness probes to verify the new version is actually serving traffic correctly. The readiness probe gates when traffic starts flowing to the new instance; the liveness probe restarts instances that become unresponsive. Without health checks, a deployment can appear successful while the application crashes on startup — health checks surface this failure at the infrastructure level.
Post-incident backfill is non-negotiable after a hotfix deploys. The hotfix took a shortcut; now run the full regression suite against the hotfix branch and create a standard release that includes the fix with complete testing.