2.3.3. Input Formatting and Conversation Management
💡 First Principle: Every foundation model has a required input format — the structure of your API request body determines whether the model interprets your content as a system instruction, a user turn, or assistant context. Formatting errors don't always raise exceptions; they silently degrade output quality by confusing the model's role understanding.
Bedrock Converse API vs. InvokeModel: The Bedrock Converse API is the recommended interface for conversational applications — it provides a consistent message structure across all supported models, abstracting model-specific formatting differences:
# Bedrock Converse API — consistent format across all models
response = bedrock_runtime.converse(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
system=[{
'text': 'You are a financial analyst. Respond only based on provided data. Never speculate.'
}],
messages=[
{'role': 'user', 'content': [{'text': 'What was Q3 revenue?'}]},
{'role': 'assistant', 'content': [{'text': 'Based on the provided documents, Q3 revenue was $4.2M.'}]},
{'role': 'user', 'content': [{'text': 'How does that compare to Q2?'}]}
],
inferenceConfig={'maxTokens': 512, 'temperature': 0.1}
)
Conversation history management — the context window problem: Multi-turn conversations accumulate tokens. After ~10 turns in a typical chatbot, the conversation history alone may consume 30–50% of the context window. Strategies to manage this:
| Strategy | When to Use | Trade-off |
|---|---|---|
| Sliding window | Most chatbots | Drops oldest messages; simple, fast |
| Summarization | Long conversations | Preserves key points; costs extra FM call |
| DynamoDB session store | Multi-session memory | Persistent but adds latency |
| Bedrock Memory (Knowledge Bases session) | Agentic applications | Managed; limited to Bedrock Agents |
Storing conversation history in DynamoDB:
# Store and retrieve conversation history from DynamoDB
def get_conversation_history(session_id, max_turns=10):
response = dynamodb.query(
TableName='chat-history',
KeyConditionExpression='session_id = :sid',
ExpressionAttributeValues={':sid': {'S': session_id}},
ScanIndexForward=False, # Newest first
Limit=max_turns * 2 # Each turn = user + assistant message
)
return [{'role': item['role']['S'],
'content': [{'text': item['content']['S']}]}
for item in reversed(response['Items'])]
⚠️ Exam Trap: The Converse API formats messages consistently across models, but the system parameter behavior varies. For Anthropic models, system is a top-level array. For Amazon Titan models, system instructions are embedded in the first user message. When exam scenarios describe cross-model portability, the Converse API is the correct service but you must still test system prompt formatting per model family.
Reflection Question: Your customer service chatbot starts giving irrelevant answers after about 15 conversational turns. Context window debugging shows the conversation history is consuming 90% of available tokens by turn 15, leaving almost no room for retrieved context from Knowledge Bases. What are two architectural solutions, and what are the trade-offs of each?