Copyright (c) 2026 MindMesh Academy. All rights reserved. This content is proprietary and may not be reproduced or distributed without permission.
3.2.2. Transformation Tools: Dataflows, Notebooks, KQL, T-SQL
💡 First Principle: Each transformation tool has a sweet spot—match the tool to the data volume, transformation complexity, and team skills. Using Dataflow Gen2 for petabyte-scale transformations will fail; using notebooks for simple column renames is overkill.
Comparative Table: Transformation Tools
| Tool | Engine | Best For | Limitations |
|---|---|---|---|
| Dataflow Gen2 | Power Query (M) | Low-code ETL, small-medium data | Not for massive datasets |
| Notebook | Apache Spark | Complex transformations, ML, big data | Requires coding skills |
| T-SQL | SQL Engine | Warehouse transformations, set-based logic | Limited to relational data |
| KQL | Kusto | Time-series analysis, real-time queries | Specialized for time-series |
T-SQL Transformation Example
-- Create transformed table from staging
CREATE TABLE dbo.FactSales AS
SELECT
d.DateKey,
p.ProductKey,
c.CustomerKey,
s.Quantity,
s.UnitPrice,
s.Quantity * s.UnitPrice AS SalesAmount
FROM staging.Sales s
JOIN dbo.DimDate d ON s.OrderDate = d.Date
JOIN dbo.DimProduct p ON s.ProductID = p.ProductID
JOIN dbo.DimCustomer c ON s.CustomerID = c.CustomerID;
Spark Transformation Example
# Read source data
df = spark.read.format("delta").load("Tables/staging_sales")
# Transform: Add calculated column, filter nulls
df_transformed = df \
.withColumn("SalesAmount", col("Quantity") * col("UnitPrice")) \
.filter(col("CustomerID").isNotNull())
# Write to lakehouse
df_transformed.write.format("delta").mode("overwrite").save("Tables/fact_sales")
Written byAlvin Varughese
Founder•15 professional certifications