Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

5.1.1. Amazon Bedrock Guardrails

💡 First Principle: Bedrock Guardrails operates as a managed content safety layer that sits between your application and the FM — it evaluates every input before it reaches the model and every output before it reaches the user, enforcing your defined policies without requiring custom code.

Guardrails configuration — the six protection categories:
CategoryWhat It DoesConfiguration
Topic denialBlock the FM from discussing specific topicsNatural language topic descriptions
Content filtersFilter harmful content (hate, violence, sexual content, self-harm)Severity thresholds per category (LOW/MEDIUM/HIGH/NONE)
Word filtersBlock specific words, phrases, or regex patternsCustom word lists + profanity filter
PII redactionDetect and mask/block personal dataChoose PII entity types; action = BLOCK or ANONYMIZE
GroundingVerify FM response is factually supported by retrieved contextThreshold score (0–1); block responses below threshold
Sensitive informationCustom regex patterns for domain-specific sensitive dataRegex patterns + action
Applying Guardrails to a Bedrock invocation:
response = bedrock_runtime.converse(
    modelId='anthropic.claude-3-sonnet-20240229-v1:0',
    guardrailConfig={
        'guardrailIdentifier': 'arn:aws:bedrock:us-east-1:123456789:guardrail/GUARDRAILID',
        'guardrailVersion': 'DRAFT',  # Or specific version number
        'trace': 'ENABLED'  # Returns trace showing which policy triggered
    },
    messages=[{'role': 'user', 'content': [{'text': user_input}]}]
)

# Check if Guardrails blocked the response
if response['stopReason'] == 'guardrail_intervened':
    guardrail_trace = response['trace']['guardrail']
    triggered_policy = guardrail_trace['inputAssessment']['topicPolicy']['topics'][0]
    log_security_event(triggered_policy, user_input)
    return "I'm not able to help with that topic."
Defense-in-depth with Guardrails + Comprehend + Lambda:

⚠️ Exam Trap: Guardrails with trace: ENABLED returns detailed information about which policy triggered and why — but this trace data includes the blocked content. Logging trace data to CloudWatch Logs creates a record of harmful content that users attempted to input. Your log retention and access control policies must account for this security-sensitive data in your logs.

Reflection Question: A competitor analysis chatbot should never discuss your company's revenue figures or employee headcount (confidential). Users have discovered they can extract this information by asking the FM to "roleplay as a financial analyst" or "pretend you're writing a fictional story about a company like ours." What Guardrails configuration addresses this, and why does topic denial handle indirect attacks better than word filters?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications