5.1.1. Amazon Bedrock Guardrails
💡 First Principle: Bedrock Guardrails operates as a managed content safety layer that sits between your application and the FM — it evaluates every input before it reaches the model and every output before it reaches the user, enforcing your defined policies without requiring custom code.
Guardrails configuration — the six protection categories:
| Category | What It Does | Configuration |
|---|---|---|
| Topic denial | Block the FM from discussing specific topics | Natural language topic descriptions |
| Content filters | Filter harmful content (hate, violence, sexual content, self-harm) | Severity thresholds per category (LOW/MEDIUM/HIGH/NONE) |
| Word filters | Block specific words, phrases, or regex patterns | Custom word lists + profanity filter |
| PII redaction | Detect and mask/block personal data | Choose PII entity types; action = BLOCK or ANONYMIZE |
| Grounding | Verify FM response is factually supported by retrieved context | Threshold score (0–1); block responses below threshold |
| Sensitive information | Custom regex patterns for domain-specific sensitive data | Regex patterns + action |
Applying Guardrails to a Bedrock invocation:
response = bedrock_runtime.converse(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
guardrailConfig={
'guardrailIdentifier': 'arn:aws:bedrock:us-east-1:123456789:guardrail/GUARDRAILID',
'guardrailVersion': 'DRAFT', # Or specific version number
'trace': 'ENABLED' # Returns trace showing which policy triggered
},
messages=[{'role': 'user', 'content': [{'text': user_input}]}]
)
# Check if Guardrails blocked the response
if response['stopReason'] == 'guardrail_intervened':
guardrail_trace = response['trace']['guardrail']
triggered_policy = guardrail_trace['inputAssessment']['topicPolicy']['topics'][0]
log_security_event(triggered_policy, user_input)
return "I'm not able to help with that topic."
Defense-in-depth with Guardrails + Comprehend + Lambda:
⚠️ Exam Trap: Guardrails with trace: ENABLED returns detailed information about which policy triggered and why — but this trace data includes the blocked content. Logging trace data to CloudWatch Logs creates a record of harmful content that users attempted to input. Your log retention and access control policies must account for this security-sensitive data in your logs.
Reflection Question: A competitor analysis chatbot should never discuss your company's revenue figures or employee headcount (confidential). Users have discovered they can extract this information by asking the FM to "roleplay as a financial analyst" or "pretend you're writing a fictional story about a company like ours." What Guardrails configuration addresses this, and why does topic denial handle indirect attacks better than word filters?