Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.
3.3.1. Lakehouse Table Optimization (OPTIMIZE, V-Order)
đź’ˇ First Principle: Delta Lake accumulates small files during streaming and incremental writes. File consolidation and V-Order optimization dramatically improve query performance.
Scenario: A Delta table received millions of small writes from streaming ingestion. Queries that once took 10 seconds now take 5 minutes because Spark must open thousands of small files.
OPTIMIZE Command
- Purpose: Consolidate small files into larger ones
- Benefit: Reduce file count, improve query performance
- Frequency: Schedule regularly for streaming tables
-- Consolidate small files
OPTIMIZE lakehouse.sales;
-- Optimize with Z-ordering on frequently filtered columns
OPTIMIZE lakehouse.sales ZORDER BY (region, date);
V-Order
- Concept: Microsoft's columnar format optimization for Parquet
- Benefit: Better compression and faster reads
- Implementation: Enabled by default in Fabric; can be explicitly applied
-- Apply V-Order during optimization
OPTIMIZE lakehouse.sales VORDER;
Visual: File Optimization Impact
Loading diagram...
⚠️ Common Pitfall: Never running OPTIMIZE on streaming tables. Small file accumulation is inevitable with streaming—schedule OPTIMIZE regularly (hourly or daily).