Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.

4.3.2. Data Ingestion Questions

Question 4

Your company uses Microsoft Fabric to integrate data from Azure Databricks and Azure Cosmos DB for unified analytics.

Which types of mirroring should you use?

  • A. Database mirroring for both
  • B. Metadata mirroring for both
  • C. Metadata mirroring for Databricks, database mirroring for Cosmos DB
  • D. Database mirroring for Databricks, metadata mirroring for Cosmos DB
Answer: C. Metadata mirroring for Databricks, database mirroring for Cosmos DB

Explanation: Azure Databricks uses metadata mirroring because Databricks data stays in place—only the Unity Catalog metadata is synchronized. Cosmos DB uses database mirroring to replicate actual data. Using database mirroring for Databricks would unnecessarily move data.


Question 5

You are implementing a data warehouse solution. You need to implement a solution that captures changes in dimension tables over time while preserving historical data.

Which type of Slowly Changing Dimension (SCD) should you use?

  • A. Type 0
  • B. Type 1
  • C. Type 2
  • D. Type 3
Answer: C. Type 2

Explanation: Type 2 SCD creates new rows for each change, maintaining complete historical records. Type 1 overwrites existing data (no history). Type 0 never updates. Type 3 adds columns for limited history (previous value only).


Question 6

Your company uses a lakehouse architecture with Microsoft Fabric for analytics. Data is ingested using Data Factory pipelines and stored in Delta tables. You need to enhance transformation efficiency and reduce loading time.

What should you do?

  • A. Configure Spark pool with more worker nodes
  • B. Use session tags to reuse Spark sessions
  • C. Switch to a different Spark runtime version
  • D. Enable Delta Lake optimization
Answer: B. Use session tags to reuse Spark sessions

Explanation: Session tags enable Spark session reuse across pipeline activities, eliminating cold start latency (30-60 seconds per session). This directly improves transformation efficiency. Adding worker nodes increases cost but doesn't address session startup overhead.