Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.2.1. Chunking Strategies for Optimal Retrieval

💡 First Principle: Chunking is a precision/recall trade-off — small chunks maximize retrieval precision (each chunk is tightly focused) but lose surrounding context; large chunks preserve context but decrease precision (more irrelevant content per chunk). The right strategy depends on your document structure and query pattern.

Chunking strategies compared:
StrategyHow It WorksBest ForTypical SizeOverlap
Fixed-sizeSplit every N tokens regardless of content boundariesSimple implementation; consistent chunk sizes256–512 tokens20–50 tokens
Sentence-basedSplit at sentence boundariesQ&A over prose documentsVariable1–2 sentences
HierarchicalCreate parent (full section) + child (paragraph) chunksLong documents where context + precision both matterParent: 1500 tokens; Child: 300 tokensNone
SemanticSplit when topic/meaning shifts significantlyDocuments with distinct topic sectionsVariableNone
CustomDocument-structure-aware (split on headers, page breaks)PDFs with clear section structureVariesConfigurable
Bedrock Knowledge Bases chunking configuration:
chunking_config = {
    "chunkingStrategy": "HIERARCHICAL",  # Or FIXED_SIZE, SEMANTIC, NONE
    "hierarchicalChunkingConfiguration": {
        "levelConfigurations": [
            {"maxTokens": 1500},  # Parent chunk
            {"maxTokens": 300}    # Child chunk
        ],
        "overlapTokens": 60
    }
}

Chunk overlap — why it matters: Without overlap, a concept split across two consecutive chunks may never be fully retrieved. A user query about "the handoff between authentication and authorization" fails if the sentence describing that handoff is split between chunk 47 and chunk 48 with no overlap.

⚠️ Exam Trap: Semantic chunking in Bedrock Knowledge Bases uses an FM to detect topic shifts — it costs tokens (FM invocation per document during ingestion) and takes longer than fixed-size chunking. Exam scenarios that require "fastest ingestion with acceptable quality" should use fixed-size chunking, not semantic.

Reflection Question: You're building a RAG system over a legal code document where each section (numbered 1.1, 1.2, etc.) is independently testable. Users query about specific sections. What chunking strategy would produce optimal precision, and how would you configure the chunk boundaries?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications