4.2.2. Infrastructure as Code: CloudFormation and CDK
💡 First Principle: CloudFormation defines infrastructure in JSON/YAML templates. CDK defines infrastructure in programming languages (Python, TypeScript) and synthesizes it into CloudFormation. Both achieve the same goal—repeatable, version-controlled infrastructure—but CDK is more productive for complex configurations because you can use loops, conditionals, and abstractions.
| Feature | CloudFormation | CDK |
|---|---|---|
| Language | JSON/YAML | Python, TypeScript, Java, C#, Go |
| Abstraction | Low-level resource definitions | High-level constructs (L2/L3) |
| Best for | Simple stacks, existing templates | Complex stacks, reusable patterns |
| Learning curve | Lower (declarative) | Higher (requires programming) |
| Integration | Native AWS | Synthesizes to CloudFormation |
| Drift detection | Built-in (Config integration) | Through synthesized CloudFormation |
For ML infrastructure, CDK's constructs can define an entire SageMaker pipeline—training job, model registry, endpoint, auto scaling—in a reusable class that's instantiated per environment. CloudFormation achieves the same result but requires more verbose, repetitive template definitions.
Why IaC matters for ML specifically: ML pipelines have more moving parts than typical web applications — S3 buckets with versioning, IAM roles with precise permissions, VPC configurations with endpoints, SageMaker training jobs, model registries, and endpoint configurations all need to align. Manually recreating this stack in a second region or environment is error-prone and audit-unfriendly. IaC templates capture the entire stack, enable code review of infrastructure changes, and provide the audit trail that Domain 4 security questions test.
AWS SAM (Serverless Application Model) is a CloudFormation extension specifically for serverless workloads. If an exam scenario describes a Lambda-based inference pipeline, SAM templates simplify the Lambda + API Gateway + IAM configuration. SAM is not a replacement for CloudFormation — it extends it for the serverless use case.
⚠️ Exam Trap: CDK generates CloudFormation under the hood. A question asking "which service creates the actual resources" has the answer CloudFormation, even when CDK is used. CDK is a development tool; CloudFormation is the execution engine.
Reflection Question: A team needs to deploy identical ML pipelines (training, endpoint, monitoring) in three AWS Regions. Which IaC approach minimizes duplication and maintenance?