Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.2.3. Search Architecture: Semantic, Keyword, and Hybrid

💡 First Principle: Semantic search finds meaning-similar content; keyword search finds exact term matches. Neither is universally superior — semantic search dominates for conceptual queries; keyword search dominates for entity lookups (specific product codes, proper names, regulatory citation numbers). Hybrid search combines both to handle mixed query patterns.

Search type performance by query category:
Query TypeExampleBest SearchWhy
Conceptual"How does authentication work?"SemanticNo specific terms to match
Entity lookup"Find policy CVE-2024-1234"KeywordExact string match critical
Mixed"What are the risks of OAuth 2.0?"HybridConcept + specific term
Negation"Policies that don't apply to contractors"HybridSemantic + structured filter
Hybrid search implementation in OpenSearch:
# Hybrid query: combine BM25 keyword score + k-NN vector score
hybrid_query = {
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "content": {
                            "query": user_query,  # BM25 keyword scoring
                            "boost": 0.3          # Weight for keyword component
                        }
                    }
                },
                {
                    "knn": {
                        "embedding": {
                            "vector": query_embedding,  # Dense vector
                            "k": 10,
                            "boost": 0.7               # Weight for semantic component
                        }
                    }
                }
            ]
        }
    }
}

Reranking — the retrieval quality multiplier: After initial retrieval (which returns top-k candidates), a reranker model re-scores the candidates based on their actual relevance to the specific query — not just their general similarity. Bedrock provides managed reranker models:

# Reranking retrieved chunks using Bedrock reranker
reranked = bedrock_agent_runtime.rerank(
    rerankingConfiguration={
        'type': 'BEDROCK_RERANKING_MODEL',
        'bedrockRerankingConfiguration': {
            'modelConfiguration': {
                'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/amazon.rerank-v1:0'
            },
            'numberOfResults': 3  # Return top 3 after reranking
        }
    },
    sources=[{'type': 'INLINE', 'inlineDocumentSource': 
              {'type': 'TEXT', 'textDocument': {'text': chunk}}} 
             for chunk in retrieved_chunks],
    queries=[{'type': 'TEXT', 'textQuery': {'text': user_query}}]
)

⚠️ Exam Trap: Hybrid search is not always better than pure vector search. For purely conceptual queries where no specific terms matter, hybrid search adds BM25 noise that can actually reduce precision. The exam tests whether you understand when to use hybrid (mixed entity + concept queries) versus when pure semantic search is sufficient.

Reflection Question: Your RAG system retrieves excellent results for general questions like "how does our leave policy work?" but fails for specific queries like "is policy HR-2024-Q3 still in effect?" What search architecture change would you implement, and how would you configure the weighting between the two search components?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications