1.1.2. Context Windows and Token Mechanics
💡 First Principle: Amazon Bedrock and Amazon SageMaker solve different problems in the FM lifecycle — Bedrock removes all infrastructure concerns so you can focus on using foundation models, while SageMaker gives you full control when you need to customize, train, or host models on your own infrastructure.
Choosing wrong between Bedrock and SageMaker is one of the most common architectural errors — and one of the most tested on the exam. The decision tree is straightforward once you understand what each service actually does:
| Dimension | Amazon Bedrock | Amazon SageMaker AI |
|---|---|---|
| Primary use | Invoke pre-built FMs via API | Train, fine-tune, and host custom models |
| Infrastructure | Fully managed — zero server management | Full control — choose instance types, containers |
| Model source | AWS partner models (Claude, Titan, Llama, etc.) | Any model (Hugging Face, custom, SageMaker JumpStart) |
| Customization | Fine-tuning on supported models via Bedrock | Full fine-tuning, continued pre-training, RLHF |
| Pricing | Per token (on-demand) or provisioned throughput | Per instance-hour |
| Who uses it | App developers integrating FMs | ML engineers training/deploying custom models |
| Startup time | Milliseconds — serverless | Minutes — instance provisioning |
The exam's critical nuance: These services are complementary, not competing. The canonical architecture for a domain-specific GenAI application is:
- Fine-tune a model using SageMaker on proprietary data
- Register it in SageMaker Model Registry
- Deploy it via SageMaker endpoint or import into Bedrock for managed invocation
- Build the application layer on Bedrock APIs
⚠️ Exam Trap: "We need to use our own data to improve the model" does not always mean SageMaker fine-tuning. If the goal is accurate responses about proprietary data, RAG via Bedrock Knowledge Bases is almost always cheaper, faster, and more maintainable than fine-tuning. Fine-tuning is for changing model behavior and style, not for injecting knowledge.
Reflection Question: Your company wants the FM to respond in your brand's writing style consistently across all outputs. Should you use RAG or fine-tuning? What signals in a question stem indicate fine-tuning versus RAG?