4.3.5. Eventstream and Eventhouse Optimization
💡 First Principle: Real-time system optimization focuses on latency reduction and throughput improvement while maintaining data integrity. The trade-offs differ from batch: you're optimizing for time, not just resource efficiency. A 5-second delay in batch analytics is invisible; in fraud detection, it's the difference between blocking a transaction and explaining it to the customer afterward.
Eventstream Optimization
| Lever | Effect | Trade-Off |
|---|---|---|
| Batch size | Larger batches improve throughput | Increases end-to-end latency |
| Parallelism (partitions) | More partitions enable parallel processing | More resources consumed |
| Filter early | Remove unwanted events before routing | Reduces downstream load |
| Schema projection | Remove unused columns at ingestion | Less data stored and processed |
| Derived streams | Route different event types separately | Targeted optimization per stream |
Optimization Pattern:
Source → [Filter unwanted] → [Project needed columns] → [Route to destinations]
Filtering and projecting at the eventstream level is always cheaper than filtering at the destination.
Eventhouse (KQL Database) Optimization
| Policy | Purpose | Configuration | Impact |
|---|---|---|---|
| Retention policy | Auto-delete data after a period | .alter table T policy retention '{"SoftDeletePeriod": "365.00:00:00"}' | Reduces storage, keeps KQL fast |
| Caching policy | Keep hot data in memory/SSD | .alter table T policy caching hot = 30d | Fast queries on recent data |
| Partitioning policy | Organize data by column | .alter table T policy partitioning '{"PartitionKeys": [...]}' | Improves filter performance |
| Ingestion batching | Control how events are batched | .alter table T policy ingestionbatching '{"MaximumBatchingTimeSpan": "00:00:30"}' | Balance latency vs. efficiency |
Caching Tiers:
Hot cache (SSD/Memory) ← Fastest queries, most expensive
└── Frequently queried recent data (e.g., last 30 days)
Cold storage ← Slower queries, cheapest
└── Historical data (e.g., older than 30 days)
Key Decision: Retention vs. Caching
- Retention controls how long data exists (deleted after period)
- Caching controls how fast data is queried (hot vs. cold)
- Data can be retained for 1 year but only cached (hot) for 30 days
⚠️ Exam Trap: Setting the caching policy longer than the retention policy is wasteful—data is deleted by retention before it ages out of cache. Always ensure caching period ≤ retention period.
⚠️ Common Pitfall: Not setting retention policies on high-volume streaming tables. Without retention, storage grows indefinitely and query performance degrades as KQL scans more data.