Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

3.1.2. Amazon Bedrock Knowledge Bases Architecture

💡 First Principle: Bedrock Knowledge Bases operationalizes the full RAG pipeline as a managed service — it moves the undifferentiated heavy lifting of ingestion, chunking, embedding, indexing, and retrieval out of your application code and into a fully managed AWS service. You define what to index; AWS manages how.

The Bedrock Knowledge Bases architecture:

Sync configuration — the most frequently tested operational detail: Knowledge Bases does not automatically update when source documents change. You must trigger a sync:

  • Manual sync: Via console or API call (bedrock-agent.start_ingestion_job())
  • Scheduled sync: EventBridge Scheduler triggers sync job on a cron schedule
  • Event-driven sync: S3 event notification → Lambda → start ingestion job on document change
# Event-driven sync triggered by S3 object creation
def lambda_handler(event, context):
    bedrock_agent = boto3.client('bedrock-agent')
    response = bedrock_agent.start_ingestion_job(
        knowledgeBaseId='KBID123456',
        dataSourceId='DSID789012',
        description=f"Auto-sync triggered by {event['Records'][0]['s3']['object']['key']}"
    )
    return response['ingestionJob']['ingestionJobId']

Metadata schema for filtered retrieval: Documents in S3 can have accompanying .metadata.json files that define structured attributes for filtered retrieval:

{
  "metadataAttributes": {
    "department": "legal",
    "document_type": "policy",
    "effective_date": "2024-01-01",
    "confidentiality": "internal"
  }
}

This enables queries like "retrieve only documents from the legal department effective after 2024" — combining semantic similarity with structured filtering.

⚠️ Exam Trap: Bedrock Knowledge Bases sync jobs are not instantaneous — they can take minutes to hours for large corpora. Architectures requiring real-time document availability (documents must be searchable within seconds of upload) cannot use Bedrock Knowledge Bases alone and need a custom OpenSearch solution with direct document ingestion.

Reflection Question: A compliance team uploads a new regulatory document to S3 and expects the chatbot to be able to answer questions about it "immediately." They're currently using Bedrock Knowledge Bases with a nightly sync job. How would you re-architect the pipeline to minimize the delay between document upload and query availability?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications