3.2.1. Chunking Strategies for Optimal Retrieval
💡 First Principle: Chunking is a precision/recall trade-off — small chunks maximize retrieval precision (each chunk is tightly focused) but lose surrounding context; large chunks preserve context but decrease precision (more irrelevant content per chunk). The right strategy depends on your document structure and query pattern.
Chunking strategies compared:
| Strategy | How It Works | Best For | Typical Size | Overlap |
|---|---|---|---|---|
| Fixed-size | Split every N tokens regardless of content boundaries | Simple implementation; consistent chunk sizes | 256–512 tokens | 20–50 tokens |
| Sentence-based | Split at sentence boundaries | Q&A over prose documents | Variable | 1–2 sentences |
| Hierarchical | Create parent (full section) + child (paragraph) chunks | Long documents where context + precision both matter | Parent: 1500 tokens; Child: 300 tokens | None |
| Semantic | Split when topic/meaning shifts significantly | Documents with distinct topic sections | Variable | None |
| Custom | Document-structure-aware (split on headers, page breaks) | PDFs with clear section structure | Varies | Configurable |
Bedrock Knowledge Bases chunking configuration:
chunking_config = {
"chunkingStrategy": "HIERARCHICAL", # Or FIXED_SIZE, SEMANTIC, NONE
"hierarchicalChunkingConfiguration": {
"levelConfigurations": [
{"maxTokens": 1500}, # Parent chunk
{"maxTokens": 300} # Child chunk
],
"overlapTokens": 60
}
}
Chunk overlap — why it matters: Without overlap, a concept split across two consecutive chunks may never be fully retrieved. A user query about "the handoff between authentication and authorization" fails if the sentence describing that handoff is split between chunk 47 and chunk 48 with no overlap.
⚠️ Exam Trap: Semantic chunking in Bedrock Knowledge Bases uses an FM to detect topic shifts — it costs tokens (FM invocation per document during ingestion) and takes longer than fixed-size chunking. Exam scenarios that require "fastest ingestion with acceptable quality" should use fixed-size chunking, not semantic.
Reflection Question: You're building a RAG system over a legal code document where each section (numbered 1.1, 1.2, etc.) is independently testable. Users query about specific sections. What chunking strategy would produce optimal precision, and how would you configure the chunk boundaries?