2.5.1. Choosing: Dataflow Gen2 vs. Pipeline vs. Notebook
š” First Principle: Each orchestration tool has distinct strengths, like specialized vehicles: Dataflows are the family sedan (easy, reliable, limited cargo), Pipelines are the logistics truck (coordinates deliveries), and Notebooks are the custom-built race car (powerful but requires a skilled driver). The right choice depends on your team's skills and requirements.
Scenario: Your team needs to: (1) Extract customer data from Salesforce, (2) Transform it by merging with product data, (3) Load into a lakehouse, (4) Run a quality check notebook, (5) Trigger a Power BI refresh. This requires multiple tools working together.
Decision Framework
| Requirement | Best Tool |
|---|---|
| No-code/low-code ETL | Dataflow Gen2 |
| Orchestrate multiple activities | Data Pipeline |
| Complex transformations (code) | Notebook |
| Real-time event routing | Eventstream |
| Simple data copy | Copy Data activity (in pipeline) |
Visual: Tool Selection Decision Tree
Dataflow Gen2
- Engine: Power Query (M)
- Interface: Visual, no-code
- Best For: Business users, simple transformations
- Limitations: Not suitable for massive datasets or complex logic
Data Pipeline
- Engine: Azure Data Factory core
- Interface: Visual orchestration canvas
- Best For: Orchestrating multiple activities, scheduling, error handling
- Activities: Copy Data, Dataflow, Notebook, Stored Procedure, etc.
Notebook
- Engine: Apache Spark
- Interface: Code (PySpark, Spark SQL)
- Best For: Complex transformations, ML, large-scale processing
- Integration: Can be called from pipelines
ā ļø Exam Trap: The Copy Data activity moves data from source to destination with basic mapping but does not transform data. For transformations, use Dataflow Gen2 or Notebook. Questions that mention needing transformations with Copy Data are testing whether you recognize this limitation.