2.3.2. Scheduled Ingestion with MWAA and Glue Triggers
š” First Principle: Scheduling becomes complex when pipelines have dependencies ā Job B can't run until Job A finishes, and Job C needs both. Simple cron scheduling can't express these dependencies, which is why MWAA (Apache Airflow) and Glue workflows exist: they model pipelines as graphs of dependent tasks, not just individual scheduled jobs.
Amazon MWAA (Managed Workflows for Apache Airflow) runs Apache Airflow ā the industry-standard workflow orchestrator ā as a managed service. You write DAGs (Directed Acyclic Graphs) in Python that define tasks and their dependencies. MWAA manages the Airflow web server, scheduler, workers, and metadata database.
Airflow excels when: pipelines have complex dependencies (fan-out, fan-in, conditional branching), you need visibility into task-level status and retries, your team already knows Airflow, or you're orchestrating non-AWS services alongside AWS services. The exam signals MWAA with phrases like "complex workflow dependencies," "task-level monitoring," or "existing Airflow DAGs."
Glue workflows provide simpler orchestration specifically for Glue jobs and crawlers. A Glue workflow defines triggers (schedule or event), crawlers, and jobs in a visual graph. It's less flexible than Airflow but requires zero code and integrates natively with Glue.
| Feature | EventBridge Scheduler | Glue Workflows | Amazon MWAA |
|---|---|---|---|
| Complexity | Simple schedules, single triggers | Linear/parallel Glue pipelines | Complex DAGs with any dependency pattern |
| Targets | Any AWS service | Glue jobs and crawlers only | Any system (AWS, external, custom) |
| Code required | Minimal (rule config) | None (visual designer) | Python DAGs |
| Dependencies | None (individual triggers) | Sequential, parallel | Full DAG dependencies, branching, loops |
| Best for | Simple scheduled triggers | Glue-only ETL pipelines | Complex multi-service orchestration |
ā ļø Exam Trap: MWAA is the most powerful orchestration option but also the most expensive and operationally complex. If a question describes a simple pipeline with 2ā3 Glue jobs running sequentially, MWAA is overkill ā Glue workflows or Step Functions are simpler. The exam rewards matching complexity to need.
Reflection Question: Your pipeline runs three Glue jobs sequentially, then a crawler, then loads data into Redshift. No external services are involved. Is MWAA the right choice?