4.2.3. GenAI Gateway Architecture
💡 First Principle: Every enterprise's GenAI workloads eventually converge on the same set of cross-cutting concerns — authentication, rate limiting, cost tracking, model routing, safety enforcement, and observability. A GenAI gateway centralizes these concerns into a single managed layer so individual application teams don't each implement them independently (and inconsistently).
GenAI gateway components:
Cost attribution with resource tagging:
# Tag every Bedrock invocation with team/project for cost allocation
bedrock_runtime.invoke_model(
modelId='anthropic.claude-3-sonnet-20240229-v1:0',
body=json.dumps(payload),
# Cost Allocation Tags passed through gateway
tags=[
{'key': 'team', 'value': event['requestContext']['authorizer']['team']},
{'key': 'project', 'value': event['headers'].get('X-Project-ID', 'unknown')},
{'key': 'environment', 'value': 'production'}
]
)
Rate limiting per team — Lambda authorizer pattern:
def lambda_authorizer(event, context):
api_key = event['headers'].get('x-api-key')
team = lookup_team_from_key(api_key)
# Check rate limit in ElastiCache (token bucket per team)
remaining_tokens = check_and_decrement_rate_limit(team)
if remaining_tokens <= 0:
raise Exception("Unauthorized") # Returns 401 to client
return generate_allow_policy(team, event['methodArn'])
⚠️ Exam Trap: Building a GenAI gateway on API Gateway + Lambda has a 29-second timeout ceiling. For streaming FM responses that exceed this, the gateway layer must use API Gateway WebSocket APIs or a streaming-capable proxy (Lambda Function URLs with response streaming) rather than the standard REST API integration.
Reflection Question: Your organization has 8 application teams each making direct Bedrock API calls. In a quarterly review, you discover three teams have no content filtering, two teams are using the most expensive model for simple classification tasks, and cost attribution is impossible because all calls share one IAM role. Design the minimal GenAI gateway that fixes all three problems.