Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

7.2.1. Diagnostic Methodology and Common Failure Patterns

💡 First Principle: The diagnostic sequence for FM application failures mirrors the data flow: start at the input, verify each transformation step produces valid intermediate output, and identify the first step where the intermediate output diverges from expected. Never skip to the final output and work backward.

Five-step diagnostic sequence:
Common failure patterns and their signatures:
Failure PatternSymptomRoot CauseDiagnostic Evidence
Retrieval missFM says "I don't have information about X" but docs existChunking too coarse; wrong embedding model; poor queryContext Precision < 0.5 in RAGAS
Context overflowFM ignores key information in long contextsContext window saturated; key info buriedInput token count near model limit
HallucinationFM states facts not in retrieved contextGrounding too weak; model filling gapsFaithfulness < 0.7; response claims not in chunks
Indirect injectionFM produces off-topic or adversarial responsesMalicious content in retrieved docsGuardrails blocked at output; anomalous retrieved content
Guardrails over-triggeringLegitimate queries rejectedTopic denial too broad; word filter too aggressiveGuardrail trace shows false positive topic match
Stale knowledgeFM gives outdated informationKnowledge Base not synced; embedding model changedKnowledgeBaseDaysSinceSync metric high
Conversation context lossFM forgets earlier conversation turnsHistory too long; truncated on sliding windowCheck conversation history length at query time
Debugging retrieval failures with CloudWatch Logs Insights:
# CloudWatch Logs Insights query to identify low-retrieval-score queries
query = """
fields @timestamp, query, top_retrieval_score, kb_id
| filter top_retrieval_score < 0.5
| sort @timestamp desc
| limit 100
"""
# High volume of low-score retrievals → retrieval pipeline problem
# Specific query types failing → those topics may not be in Knowledge Base
Debugging Guardrails over-triggering:
def analyze_guardrail_triggers(start_time, end_time):
    """Query invocation logs for Guardrails trigger analysis."""
    logs = cloudwatch_logs.filter_log_events(
        logGroupName='/aws/bedrock/model-invocations',
        startTime=start_time,
        endTime=end_time,
        filterPattern='{ $.stopReason = "guardrail_intervened" }'
    )
    
    trigger_reasons = defaultdict(int)
    for event in logs['events']:
        log_entry = json.loads(event['message'])
        policy = log_entry.get('guardrailTrace', {}).get('triggeredPolicy', 'unknown')
        trigger_reasons[policy] += 1
    
    # Identify top triggering policies — if topic denial firing on legitimate queries,
    # the topic description may be too broad
    return sorted(trigger_reasons.items(), key=lambda x: x[1], reverse=True)

⚠️ Exam Trap: Bedrock Model Invocation Logs show the actual prompt sent to the model and the actual response received — but only if logging was enabled before the problematic invocations occurred. Retroactive logging enablement cannot reconstruct past invocations. Enable invocation logging from day one, before the first production deployment.

Reflection Question: Users report that your RAG bot correctly answers "What is our refund policy?" but consistently fails on "Can I return a gift I received?" — giving a generic response despite a clear gift return policy in your knowledge base. Using the five-step diagnostic sequence, what would you check at each step, and which step is most likely the failure point?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications