Airflow vs Luigi vs Snakemake vs ...
The choice should provide
- Modular standardised processing functions (e.g. cleaning, data retrieval, ML, ETL,...)
- Neat way to build pipelines of the modules
- Pipelines should be callable via API
- Tracking of parametrised artefacts (e.g. intermediate results between steps or outputs)
- Asynchronous execution of tasks
- Ideally scheduling to parallel workers
- Ideally tasks then can run externally (like PIK cluster)
- Ideally modular functions can be imported like python package in local projects