1.1.1. OneLake: The Unified Data Lake
đź’ˇ First Principle: OneLake is Fabric's single, unified, logical data lake. All Fabric workloads automatically store their data in OneLake, eliminating data silos and enabling cross-workload analytics without data movement.
Scenario: A data engineer creates a lakehouse, a data warehouse, and a KQL database. In traditional architectures, each would have its own storage. In Fabric, all three store data in OneLake, allowing a single security model and enabling queries that span workloads.
Key Characteristics of OneLake
- Single Namespace: One logical data lake for the entire organization
- Delta Lake Foundation: All tabular data stored in Delta format (Parquet + transaction log)
- Automatic Data Storage: Every Fabric item stores data directly in OneLake
- Hierarchical Organization: Tenant → Capacity → Workspace → Items
- Built-in Governance: Inherited security, lineage, and compliance
Visual: OneLake Architecture
Loading diagram...
Shortcuts: Virtual Data Access
- Concept: Pointers to external data sources that appear as native OneLake data
- Types:
- Internal Shortcuts: Point to other locations within OneLake
- External Shortcuts: Point to Azure Data Lake Gen2, AWS S3, or Google Cloud Storage
- Benefit: Access external data without copying, maintaining single source of truth
⚠️ Common Pitfall: Thinking shortcuts copy data. Shortcuts are metadata pointers—data stays in its original location. This means external data governance still applies to the source system.
Key Trade-Offs:
- Centralization vs. Flexibility: OneLake centralizes governance but requires all workloads to conform to its security model
- Shortcuts vs. Copies: Shortcuts avoid duplication but depend on source system availability
Reflection Question: If your organization requires data to physically reside within a specific geographic region for compliance, how would OneLake shortcuts to an external S3 bucket in a different region affect your compliance posture?