Federating execution between Airflow instances with Dagster
This tutorial demonstrates using dagster-airlift to observe DAGs from multiple Airflow instances, and federate execution between them using Dagster as a centralized control plane.
Using dagster-airlift we can
- Observe Airflow DAGs and their execution history
- Directly trigger Airflow DAGs from Dagster
- Set up federated execution across Airflow instances
All of this can be done with no changes to Airflow code.
Overview
This tutorial will take you through an imaginary data platform team that has the following scenario:
- An Airflow instance warehouse, run by another team, that is responsible for loading data into a data warehouse.
- An Airflow instance metrics, run by the data platform team, that deploys all the metrics constructed by data scientists on top of the data warehouse.
Two DAGs have been causing a lot of pain lately for the team: warehouse.load_customers and metrics.customer_metrics. The warehouse.load_customers DAG is responsible for loading customer data into the data warehouse, and the metrics.customer_metrics DAG is responsible for computing metrics on top of the customer data. There's a cross-instance dependency relationship between these two DAGs, but it's not observable or controllable. The data platform team would ideally only like to rebuild the metrics.customer_metrics DAG when the warehouse.load_customers DAG has new data. In this guide, we'll use dagster-airlift to observe the warehouse and metrics Airflow instances, and set up a federated execution controlled by Dagster that only triggers the metrics.customer_metrics DAG when the warehouse.load_customers DAG has new data. This process won't require any changes to the Airflow code.