3.2.1. Choosing the Right Data Store
š” First Principle: Fabric offers multiple data stores optimized for different workloadsālike specialized containers for different goods. Choosing correctly impacts performance, cost, and query capabilities. A warehouse optimizes for T-SQL; a lakehouse optimizes for Spark; a KQL database optimizes for time-series.
Scenario: You need to store: (1) raw CSV files from external systems, (2) transformed relational data for SQL analysts, (3) real-time sensor data for operational dashboards. Each requires a different store.
Data Store Selection Guide
| Data Store | Best For | Query Language | Storage Format |
|---|---|---|---|
| Lakehouse | Big data, data science, flexible schema | Spark SQL, PySpark | Delta + Files |
| Data Warehouse | Structured analytics, SQL analysts | T-SQL | Delta |
| KQL Database | Real-time analytics, time-series | KQL | Optimized columnar |
| OneLake Files | Raw file storage, staging | N/A (file access) | Any |
Visual: Data Store Selection
ā ļø Exam Trap: Storing structured data as raw files when a lakehouse table is more appropriate creates unnecessary work. Files require manual schema management; Delta tables provide schema enforcement, ACID transactions, and SQL access automatically.