Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

2.3.2. Multimodal Data Processing

💡 First Principle: Multimodal data cannot be passed directly to an FM — images must be base64-encoded, audio must be transcribed to text, PDFs must have text extracted, and all inputs must be assembled into the model-specific JSON structure before the API call. The FM never touches your raw files.

Processing pipelines by modality:
Input TypeExtraction ServiceFM-Ready FormatKey Consideration
PDF documentsAmazon Textract (for scanned) or direct text extractionText blocks in JSONOCR confidence score — low confidence → poor FM quality
ImagesPass directly if Claude 3 multimodalbase64-encoded in request bodyMax file size; cannot pass S3 URI to model directly
Audio/videoAmazon TranscribeTranscript text with speaker labelsSpeaker diarization for meeting notes use cases
Tabular data (CSV/Excel)Lambda/Pandas transformationMarkdown table or structured JSONFMs understand markdown tables better than raw CSV
HTML/web contentLambda HTML parserClean text (strip tags, scripts)Boilerplate navigation HTML degrades context quality

Bedrock Data Automation — the managed service for multimodal document processing at scale. It handles PDFs, images, audio, and video with standardized output formats, eliminating the need to build and maintain custom extraction pipelines:

# Bedrock Data Automation for PDF batch processing
bedrock_data_auto = boto3.client('bedrock-data-automation')

response = bedrock_data_auto.invoke_data_automation_async(
    inputConfiguration={
        's3Uri': 's3://my-bucket/invoices/',
        'documentConfiguration': {'parsingStrategy': 'AUTO'}
    },
    outputConfiguration={'s3Uri': 's3://my-bucket/extracted/'},
    dataAutomationConfiguration={
        'dataAutomationProjectArn': 'arn:aws:bedrock:...:data-automation-project/invoice-extraction'
    }
)

⚠️ Exam Trap: Images are encoded as base64 and embedded directly in the Bedrock API request body — not referenced by S3 URL. When exam scenarios describe "sending an image to Bedrock for analysis," the correct architecture includes a Lambda function that reads the image from S3, base64-encodes it, and constructs the multimodal API payload. The FM cannot read from S3 independently.

Reflection Question: You need to build a system that processes 10,000 scanned invoices per day, extracts line items, and generates a structured JSON summary using an FM. What is the complete processing pipeline, naming each AWS service involved?

Alvin Varughese
Written byAlvin Varughese
Founder15 professional certifications