Skip to content

Airflow vs Luigi vs Snakemake vs ...

The choice should provide

  • Modular standardised processing functions (e.g. cleaning, data retrieval, ML, ETL,...)
  • Neat way to build pipelines of the modules
  • Pipelines should be callable via API
  • Tracking of parametrised artefacts (e.g. intermediate results between steps or outputs)
  • Asynchronous execution of tasks
  • Ideally scheduling to parallel workers
  • Ideally tasks then can run externally (like PIK cluster)
  • Ideally modular functions can be imported like python package in local projects