5.3.2. VPCs, Subnets, and Network Security for ML
💡 First Principle: By default, SageMaker training jobs and endpoints communicate over the public internet. This is convenient for development but unacceptable for production ML systems handling sensitive data. Running SageMaker in VPC mode ensures all traffic stays within your private network, preventing data exfiltration and reducing the attack surface.
When you configure SageMaker to run in VPC mode, training jobs and endpoints launch inside your VPC's subnets. This means they can access private resources (databases, internal APIs) and are governed by your security groups and network ACLs. However, VPC mode also means SageMaker components can no longer reach the public internet by default—which is necessary for downloading packages, pushing logs to CloudWatch, or accessing S3.
To resolve this, you configure VPC endpoints (AWS PrivateLink) that provide private connectivity to AWS services without traversing the internet:
Network isolation goes one step further than VPC mode. When enabled, SageMaker containers cannot make any outbound network calls—not even to other AWS services. All data must be pre-loaded into the container. This is the most restrictive option and is used when regulations require absolute certainty that training data cannot leave the environment.
| Security Level | Configuration | Can Access Internet | Can Access AWS Services | Use When |
|---|---|---|---|---|
| Default | No VPC config | ✅ | ✅ | Development, non-sensitive data |
| VPC Mode | VPC + subnets + security groups | ❌ (unless NAT) | ✅ (via VPC endpoints) | Production with sensitive data |
| Network Isolation | VPC mode + enable_network_isolation=True | ❌ | ❌ | Regulated industries, highly sensitive data |
⚠️ Exam Trap: A question describing "ensuring SageMaker training data never leaves the private network" is testing VPC mode + network isolation, not encryption. Encryption protects data at rest and in transit—it doesn't prevent data from being sent to an external destination. Network controls prevent that.
Reflection Question: A healthcare company processing PHI wants to train a model using SageMaker. Regulations require that training data never traverse the public internet and that containers cannot make outbound calls. What configuration achieves this, and which VPC endpoints are required?