4.1.3.2. Design for Azure Availability Sets
š” First Principle: Distributing virtual machines across isolated hardware clusters within a single datacenter provides a crucial layer of redundancy against localized hardware failures and planned maintenance events.
Scenario: You are designing a solution for a critical application running on multiple Virtual Machines. This application cannot be easily refactored to span Availability Zones, but you need to ensure it remains available even if the underlying hardware (e.g., server rack, network switch) within a single datacenter fails, or during planned Azure maintenance.
An Availability Set is a logical grouping capability for virtual machines that allows Azure to understand how your application is built to provide redundancy and availability.
Key Design Considerations:
- Fault Domains: Each Availability Set splits VMs into fault domains, which are physically separate groups of hardware (power, network, servers).
- Update Domains: VMs are also distributed across update domains. During planned maintenance, Azure updates one update domain at a time.
- VM Deployment: All VMs in an Availability Set must reside in the same resource group and virtual network.
- Limitations: Availability Sets provide high availability only within a single Azure datacenter. They do not protect against datacenter-wide or regional outages.
- Use Cases: Ideal for legacy workloads or applications that cannot use Availability Zones, but still require improved uptime within a datacenter.
ā ļø Common Pitfall: Placing a single VM in an Availability Set. An Availability Set provides no benefit unless it contains at least two VMs distributed across different fault and update domains.
Key Trade-Offs:
- Availability vs. Control: When using an Availability Set, you cede some control over the exact physical placement of your VMs to Azure, in exchange for the guarantee of hardware isolation.
Reflection Question: How does designing for Azure Availability Sets, by distributing VMs across isolated fault domains and update domains, fundamentally minimize downtime from hardware failures or maintenance, ensuring application availability within a single datacenter?