6.2.3. Cost-Performance Trade-off Analysis
💡 First Principle: Every optimization exists on a cost-performance trade-off curve — the question is never "which optimization is best?" but "which optimization delivers the most value per dollar of engineering effort, given this application's specific SLA and budget constraints?"
The optimization decision matrix:
| Scenario | Primary Problem | Best Optimization | Cost Impact |
|---|---|---|---|
| High cost, adequate latency | Tokens expensive | Prompt caching + model right-sizing | -40 to -80% cost |
| High latency, acceptable cost | Slow responses | Streaming + generation optimization | No cost change |
| Both high cost and high latency | Both problems | Model downgrade + caching + streaming | -50% cost + lower latency |
| Inconsistent latency (high variance) | P99 vs P50 gap | Provisioned throughput | +20-30% cost |
| Peak load failures | Capacity | Provisioned throughput + SQS buffer | +cost but reliable |
CloudWatch cost dashboard queries:
# CloudWatch Metrics Insights query for cost trending
cost_query = """
SELECT SUM(InputTokens) * 0.000003 + SUM(OutputTokens) * 0.000015 AS EstimatedCostUSD
FROM SCHEMA("GenAI/Cost", ModelId)
GROUP BY ModelId
| ORDER BY EstimatedCostUSD DESC
| LIMIT 10
"""
# Identifies which model and which application path is driving the most cost
⚠️ Exam Trap: CloudWatch estimated cost metrics based on token counts are approximations — actual Bedrock billing uses the exact pricing tier for your account, which may include volume discounts, enterprise agreements, or promotional credits not reflected in token-based calculations. Use AWS Cost Explorer with Bedrock service tags as the authoritative cost source.
Reflection Question: Your application team is debating three optimizations: (A) add semantic caching, estimated to reduce FM calls by 35% at $8K engineering cost; (B) rewrite prompts to reduce average input tokens by 20%, estimated $5K engineering cost; (C) switch from Claude Sonnet to Haiku for simple queries (40% of traffic), estimated $12K engineering cost. Current FM costs are $50K/month. Rank these by ROI (payback period in months).