2.6.2. Amazon MWAA (Apache Airflow) and Glue Workflows
š” First Principle: MWAA is for teams that need maximum orchestration flexibility ā complex dependency graphs, custom operators, integration with non-AWS systems, and task-level visibility. Glue Workflows is for teams that only need to orchestrate Glue jobs and crawlers. The decision is a spectrum of complexity: don't bring an orchestra conductor when a metronome will do.
Amazon MWAA runs Apache Airflow as a managed service. You write Directed Acyclic Graphs (DAGs) in Python that define tasks, dependencies, schedules, and retry behavior. MWAA manages the Airflow web server, scheduler, workers, and metadata database ā you focus on DAG development.
Airflow's strength is its operator ecosystem ā pre-built integrations for AWS services (GlueJobOperator, RedshiftSQLOperator, S3KeySensor), databases (PostgresOperator, MySqlOperator), and external systems (SlackWebhookOperator, HttpOperator). Complex pipelines that coordinate AWS services, check external APIs, and send notifications to Slack are natural Airflow territory.
Glue Workflows provide native orchestration for Glue resources. You define triggers (schedule or on-demand), connect them to crawlers and ETL jobs, and Glue manages execution ordering. Workflows support conditional triggers ("start Job B only if Job A succeeds") and parallel execution. The visual editor in the Glue console shows workflow status in real time.
| Aspect | Step Functions | MWAA (Airflow) | Glue Workflows |
|---|---|---|---|
| Pricing | Per state transition | Per environment-hour (always-on) | Free (pay for jobs/crawlers) |
| Startup cost | None (serverless) | ~$350/month minimum (smallest env) | None |
| Scope | Any AWS service | Any system (AWS, external, custom) | Glue jobs and crawlers only |
| Language | JSON (States Language) | Python (DAGs) | Visual / API |
| Complexity | Medium | High (but most flexible) | Low |
| Best for | Multi-service AWS workflows | Complex, cross-system orchestration | Simple Glue-only pipelines |
ā ļø Exam Trap: MWAA has a minimum cost (~$350/month for the smallest environment) because the Airflow web server runs 24/7. If a question describes a cost-sensitive workload with a simple Glue pipeline, MWAA is the wrong answer ā Glue Workflows or Step Functions are cheaper. The exam punishes overly complex solutions.
Reflection Question: A small startup has a single Glue ETL job that runs nightly, followed by a Glue crawler. They want to add monitoring but minimize costs. Which orchestration service is the best fit?