Migrating to Dagster
If you have existing data pipelines—whether standalone Python scripts run by cron, Airflow DAGs, or Prefect flows—you can migrate them to Dagster incrementally.
Incremental migration strategy
You don't need to migrate everything at once. A common approach:
- Run both systems in parallel. Keep your existing scheduler running while you bring pipelines into Dagster one at a time. Dagster won't interfere with your existing cron jobs or Airflow instance.
- Migrate leaf pipelines first. Start with pipelines that don't depend on upstream Airflow tasks—these have the fewest cross-system dependencies to manage.
- Use sensors for cross-system handoffs. While migrating, a Dagster
@sensorcan watch for files, database rows, or other signals produced by your old system and trigger Dagster runs. This lets the two systems cooperate without tight coupling. - Cut over the schedule last. Once the Dagster version is confirmed working, disable the old cron entry or Airflow DAG and activate the Dagster schedule.
Migration guides
- Airflow to Dagster — Rewrite Airflow DAGs as native Dagster assets
- Prefect to Dagster — Convert Prefect flows and tasks to Dagster assets
- Python to Dagster — Wrap cron-scheduled scripts as Dagster assets
Concept mapping reference
| Dagster | Airflow | Prefect |
|---|---|---|
| Asset graph | DAG | Flow |
@asset | @task | @task |
@schedule with define_asset_job | schedule on @dag | Deployment schedule on @flow |
deps for ordering; I/O manager for data handoff | XCom | Task return values |
retry_policy on asset | retries, retry_delay in default_args | retries on @task |
group_name, tags on asset | DAG tags | Flow tags |
| Resources | Airflow Variables / Connections | Prefect Blocks |
@sensor | ExternalTaskSensor, FileSensor | Prefect automations |
Common pitfalls
deps is ordering, not data passing. The most common mistake when migrating from Airflow or Prefect: expecting downstream assets to receive the upstream return value automatically. They don't. Either read inputs from your storage layer inside each asset, or use an I/O manager to handle the handoff.
Connections and secrets need to become resources. Airflow Connections and Prefect Blocks store credentials in their respective metadata backends. In Dagster, move these to Resources. Resources are injected at runtime via function parameters and can be configured per-environment.
Sensors replace polling operators. ExternalTaskSensor, FileSensor, HttpSensor, and similar Airflow operators don't have direct op-level equivalents. Replace them with a Dagster @sensor that polls for the condition and yields a RunRequest.
Schedules don't backfill on deploy by default. Airflow's catchup=True is the default; Dagster doesn't have a direct equivalent. If you need historical backfills, trigger them explicitly using backfill rather than relying on deployment behavior.
Scripts need explicit asset boundaries. When migrating from cron-scheduled scripts, you'll need to decide what each asset represents. A single script may produce multiple logical outputs that benefit from being modeled as separate assets with their own materialization history and lineage.