Skip to content

kunalpjain/dag-runner

Repository files navigation

dag-runner

A lightweight CLI that runs individual Airflow tasks or full DAGs locally, in a single process, with no Docker, no scheduler, and no metadata database required.

It reads your DAG file, auto-detects operator types, replaces external I/O (HTTP, S3, SQL, GCS, BigQuery) with configurable stubs, and reports pass/fail in seconds. Designed to drop in as a pre-commit hook.


Why

Tool Needs DB? Needs Docker? Pre-commit ready? Speed
dag.test() Yes (real) Sometimes No Slow
Astro CLI Yes Yes No ~30 s
dag-runner No No Yes < 10 s

Installation

pip install dag-runner
# or, from source:
pip install -e /path/to/dag-runner

apache-airflow must already be installed in the same environment (it is treated as a peer dependency so you control the version).


Quick Start

# Run all tasks in a DAG file
dag-runner run dags/my_pipeline.py

# Run a single task
dag-runner run dags/my_pipeline.py --task extract

# Preview the execution plan without running
dag-runner run dags/my_pipeline.py --dry-run

# Show operator types and stub requirements
dag-runner inspect dags/my_pipeline.py

# Generate a starter config
dag-runner init-config

Example output

Loading dags/my_pipeline.py...

dag-runner  ›  my_pipeline
Running 4 task(s)...

  ✓  PASS  extract                                   0.03s
  ✓  PASS  transform                                 0.12s
  ✗  FAIL  load_to_postgres                          0.01s
       Error: OperationalError: no such table: events
       ...traceback...
  ~  SKIP  notify                                    0.00s

────────────────────────────────────────────────────────────
Results

  ✓ Passed:  2
  ✗ Failed:  1
  ~ Skipped: 1

  Wall time: 0.17s  ·  DAG: my_pipeline

FAILED — 1 task(s) did not pass

CLI Reference

dag-runner run DAG_FILE

Flag Description
--task TASK_ID Run only this task (no dependency resolution)
--dag-id DAG_ID Select a DAG when the file defines more than one
--config PATH Path to dag-runner.yaml (default: ./dag-runner.yaml)
--dry-run Print execution plan without running
--verbose / -v Show stdout/stderr for every task
--json-output Emit JSON result to stdout (useful for CI parsing)
--no-color Disable ANSI color codes

dag-runner inspect DAG_FILE

Displays each task, its operator class, inferred category, and whether a stub will be applied.

dag-runner init-config

Writes a starter dag-runner.yaml to the current directory.


Configuration — dag-runner.yaml

fixture_dir: fixtures          # directory for fixture files
timeout_seconds: 300

default_stub:
  http_status_code: 200
  # http_fixture: responses/api.json   # relative to fixture_dir
  s3_fixture_content: '{"data": []}'
  sql_return_empty: true
  bash_mock: false
  python_mock: false

# Per-operator overrides (keyed by exact class name)
stubs:
  SimpleHttpOperator:
    http_fixture: responses/my_api.json
  PostgresOperator:
    sql_fixture: results/query.json

# Env vars injected before DAG import
env:
  MY_API_KEY: fake_key_for_testing

Fixture files

Place fixture files in the fixture_dir directory:

fixtures/
  responses/
    my_api.json          # HTTP stub response body
  results/
    query.json           # SQL stub result (JSON array of rows)

Stub Behavior

Operator / Hook Default stub
SimpleHttpOperator, HttpSensor Returns {"status": "ok"} (or fixture file)
S3Hook methods (read_key, load_string, …) Returns {"data": []} (or fixture content)
PostgresHook, MySqlHook, DbApiHook Returns empty DataFrame / empty list
GCSHook download writes empty file; upload is a no-op
BigQueryHook Returns empty results; job state = DONE
EmailOperator Logs a message, no email sent
ExternalTaskSensor poke() always returns True
BashOperator Runs as-is (local shell command)
PythonOperator Runs as-is (calls your Python callable)

Set bash_mock: true or python_mock: true in the config to no-op those operators too.


Pre-commit Hook

Add to your .pre-commit-config.yaml:

repos:
  - repo: https://github.com/your-org/dag-runner
    rev: v0.1.0
    hooks:
      - id: dag-runner
        # Only match files in the dags/ directory:
        files: "^dags/.*\\.py$"
        # Optional: point to your config file
        args: [--config, dag-runner.yaml]

Or, if you have dag-runner installed in your project's virtualenv, use the local repo form:

repos:
  - repo: local
    hooks:
      - id: dag-runner
        name: dag-runner smoke test
        language: system
        entry: dag-runner run
        files: "^dags/.*\\.py$"
        types: [python]
        args: [--config, dag-runner.yaml]

Each staged DAG file is passed as the argument to dag-runner run. If any task fails, the commit is blocked and the error is printed.

Running manually

pre-commit run dag-runner --all-files

How It Works

  1. Env setup — Before importing Airflow, dag-runner sets AIRFLOW__DATABASE__SQL_ALCHEMY_CONN to a temporary SQLite file and enables AIRFLOW__CORE__UNIT_TEST_MODE.

  2. Connection patchingBaseHook.get_connection and Variable.get are monkeypatched to return mock objects, so operators never reach a real metastore.

  3. DAG loading — The DAG file is imported via importlib and all DAG instances found in the module's global namespace are collected.

  4. Operator registry — Each task's operator class is matched against a lookup table to determine its category and whether a stub is needed.

  5. Stub injection — Hook classes (HttpHook, S3Hook, PostgresHook, …) are replaced with in-process fakes for the duration of each task.execute() call.

  6. Execution — Tasks run in topological order in the same process. stdout/stderr are captured per-task.

  7. Reporting — Results are printed with pass/fail icons, per-task timing, and error tracebacks. The process exits with code 0 (all pass) or 1 (any failure).


Limitations

  • Requires apache-airflow to be installed in the same environment.
  • Custom hooks or operators that use non-standard connection patterns may need additional patches via dag-runner.yaml.
  • Very long-running tasks will still run to completion (no timeout enforcement yet — use timeout_seconds in a future version).
  • BashOperator commands execute for real; if they have side effects or require external tools, mock them with bash_mock: true.

About

Run Airflow DAGs and individual tasks locally in a single process — no Docker, no scheduler, no metadata DB. Auto-stubs HTTP/S3/SQL/BigQuery I/O and reports pass/fail in under 10 seconds. Designed to drop in as a pre-commit hook.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages