3.2.2. Embedding Model Selection and Management
💡 First Principle: The embedding model defines the geometry of your vector space — two chunks are "similar" according to whatever relationships the embedding model learned during its training. An embedding model trained on general web text may not capture domain-specific similarity; a medical embedding model may group "myocardial infarction" and "heart attack" together while a general model may not.
Embedding model selection criteria:
| Criterion | Consideration |
|---|---|
| Dimensionality | Higher (e.g., 1536-dim) = more expressive but more storage + compute. 1024-dim (Titan v2) is adequate for most use cases |
| Domain fit | Did the model train on content similar to yours? General models underperform on specialized domains |
| Multilingual support | If your corpus contains non-English content, use a multilingual embedding model |
| Context window | Maximum tokens per document chunk the embedding model can process |
| Latency | Embedding latency affects both ingestion throughput and query latency |
The embedding model lock-in problem: Once you've indexed your corpus with Titan Embeddings v2, switching to a different embedding model requires re-embedding every document:
Blue/green index approach for zero-downtime embedding model migration:
- Build the new index in parallel (index-v2 with new embedding model)
- Dual-write incoming updates to both indexes
- Run retrieval quality comparison between indexes on golden query set
- Switch traffic to index-v2 via configuration change
- Decommission index-v1
⚠️ Exam Trap: You only need to generate embeddings once during document ingestion — this is false. Embeddings must be regenerated when: (1) the embedding model changes, (2) the document content changes, (3) you change chunking strategy (new chunk boundaries = new texts to embed). Treat your vector index as derived data that must be rebuilt when either source documents or the embedding pipeline changes.
Reflection Question: Your semantic search quality degrades after a Bedrock service update that upgraded the Titan Embeddings model version. No error is returned — queries simply return less relevant results. What happened architecturally, and what is the remediation plan?