2.1.1. OneLake: The Unified Data Lake
💡 First Principle: OneLake works like a company-wide shared drive where every department's files automatically appear in the same place—no copying, no syncing, no "which version is correct?" debates. All Fabric workloads store data here by default, which means a lakehouse, a data warehouse, and a KQL database can all access the same underlying data without moving it.
Scenario: A data engineer creates a lakehouse, a data warehouse, and a KQL database. In traditional architectures, each would have isolated storage, requiring complex ETL to share data. In Fabric, all three store data in OneLake, enabling cross-workload analytics with a single security model.
Key Characteristics of OneLake
- Single Namespace: One logical data lake for the entire organization—no more data silos
- Delta Lake Foundation: All tabular data stored in Delta format (Parquet + transaction log) providing ACID transactions
- Automatic Data Storage: Every Fabric item stores data directly in OneLake without explicit configuration
- Hierarchical Organization: Tenant → Capacity → Workspace → Items
- Built-in Governance: Security, lineage, and compliance inherited automatically
Visual: OneLake Architecture
Shortcuts: Virtual Data Access
- Concept: Pointers to external data sources that appear as native OneLake data
- Types:
- Internal Shortcuts: Point to other locations within OneLake
- External Shortcuts: Point to Azure Data Lake Gen2, AWS S3, or Google Cloud Storage
- Benefit: Access external data without copying, maintaining single source of truth
⚠️ Exam Trap: Shortcuts are metadata pointers—data stays in its original location. This means external data governance still applies to the source system. If someone asks about copying data, shortcuts are NOT the answer.
Key Trade-Offs:
- Centralization vs. Flexibility: OneLake centralizes governance but requires all workloads to conform to its security model
- Shortcuts vs. Copies: Shortcuts avoid duplication but depend on source system availability—if AWS S3 goes down, your shortcut is inaccessible
Reflection Question: If your organization requires data to physically reside within a specific geographic region for compliance, how would OneLake shortcuts to an external S3 bucket in a different region affect your compliance posture?