Idea
For future workflows consisting of exclusively python code:
Rather than needing to define a docker container per-workflow, there could be a single “dummy” docker container that installs a python package that is developed independently (eg pip install portal-containers). This package could expose a CLI that is parameterized by a workflow name (eg anndata-to-ui), and the input/output directory paths. Based on the workflow name, this CLI would call a different data processing function defined in the package.
A python package for data processing developed/tested independently would substantially lower the barrier to developing/improving these workflows.
Docker would still be wrapping this package before it is used in the ingest-pipeline, but there would only need to be one dockerfile and one docker image per version of the python package.
The existing workflows could be incrementally migrated into such a package and of course the standalone dockerfile option could continue to be used in cases of dependency conflicts or non-python processing code.
Idea
For future workflows consisting of exclusively python code:
Rather than needing to define a docker container per-workflow, there could be a single “dummy” docker container that installs a python package that is developed independently (eg
pip install portal-containers). This package could expose a CLI that is parameterized by a workflow name (eganndata-to-ui), and the input/output directory paths. Based on the workflow name, this CLI would call a different data processing function defined in the package.A python package for data processing developed/tested independently would substantially lower the barrier to developing/improving these workflows.
Docker would still be wrapping this package before it is used in the ingest-pipeline, but there would only need to be one dockerfile and one docker image per version of the python package.
The existing workflows could be incrementally migrated into such a package and of course the standalone dockerfile option could continue to be used in cases of dependency conflicts or non-python processing code.