2.1.3. FM Customization: Fine-Tuning and Continued Pre-Training
💡 First Principle: Fine-tuning shifts the model's probability distribution toward outputs that match your labeled examples — it changes how the model responds, not what it knows. If your goal is accurate recall of domain facts, fine-tuning is the wrong tool; use RAG. Fine-tuning is the right tool when you need consistent output style, format, or behavior.
When to use each customization approach:
| Technique | What It Changes | Use When | Cost Level |
|---|---|---|---|
| Prompt engineering | Nothing in the model | Behavior can be guided by instructions | $ |
| RAG | Nothing in the model; adds retrieval | Need factual accuracy about proprietary/recent data | $ |
| Fine-tuning | Model's output distribution | Need consistent style, format, domain-specific tone | $$ |
| Continued pre-training | Model's knowledge base | Need the model to deeply internalize domain corpus | $$ |
Fine-tuning on Bedrock: Amazon Bedrock supports fine-tuning for select models (primarily Amazon Titan and some Meta Llama models). The process:
- Prepare a labeled training dataset in JSONL format (prompt/completion pairs) in S3
- Launch a fine-tuning job via Bedrock console or API
- The fine-tuned model variant is deployed as a provisioned model on Bedrock
- Register the model in SageMaker Model Registry for versioning and lifecycle management
LoRA (Low-Rank Adaptation): For models deployed on SageMaker, LoRA enables parameter-efficient fine-tuning by adding small, trainable rank-decomposition matrices to the frozen model weights. This dramatically reduces compute cost and storage — you train only millions of parameters rather than billions.
⚠️ Exam Trap: Fine-tuning does not prevent hallucination. A fine-tuned model still generates statistically plausible outputs and will confabulate facts not in its training data. Combining fine-tuning with RAG is a common production pattern: fine-tune for style/format, add RAG for factual grounding.
Reflection Question: A legal firm wants to build a contract review tool. They have 10,000 labeled examples of contracts with expert annotations marking problematic clauses. They also have 200,000 historical contracts they want the model to "know." What combination of customization techniques would you recommend, and why?