4.2. Creating and Scripting Infrastructure
💡 First Principle: Production ML infrastructure must be reproducible, scalable, and cost-efficient. Manual configuration through the console is fine for experimentation but creates "snowflake infrastructure" in production—unique, undocumented, and impossible to replicate. Infrastructure as Code (IaC) solves this, and auto scaling ensures you pay only for what you use.
Without IaC, what happens when your SageMaker endpoint configuration needs to change across three environments (dev, staging, production)? Someone manually clicks through the console three times, inevitably making a mistake in one. Without auto scaling, what happens during a Black Friday traffic spike? Either you over-provisioned (wasting money for 364 days) or you under-provisioned (dropping requests during peak).
Think of IaC like architectural blueprints. You'd never build three identical buildings by having three separate foremen eyeball the dimensions. You'd use the same blueprint three times. CloudFormation and CDK are your blueprints for ML infrastructure.
⚠️ Common Misconception: Auto scaling means your endpoint handles any traffic spike seamlessly. In reality, scaling takes time — new instances need to download the model from S3, load it into memory, and warm up. If your model is 10 GB, cold-start latency can be minutes. The exam tests this by describing scenarios where users see timeouts during traffic spikes. The fix is pre-warming (scheduled scaling before anticipated peaks), not just reactive target tracking.