This repository contains a Docker image that contains all the scripts and programs needed to download data from a source and upload them into a LakeFS instance, and a Helm chart for setting up a CronJob to execute this Docker image at a regular schedule on Sterling. For simplicity's sake, there is only a single Dockerfile (that contains all the scripts and all the programs needed) and only a single Helm chart (with different values.yaml files to choose which ingest you want to set up), but in the future this may need to be reorganized.
- Copy values-secret.yaml.txt to
charts/dug-data-ingest/values-secret.yamland add the authentication details for the LakeFS server. - Helm install the chart in
charts/dug-data-ingestwith three values files:values.yaml, with the default settings.values-secret.yaml, with the authentication details.values/*.yamlcorresponding to the Dug Data Ingest you want to set up, such asvalues/bdc-ingest.yamlfor the BDC ingest.
- Create a new directory for your ingest in the scripts directory, e.g.
scripts/babel. - Add the scripts needed to run an ingest in this directory. We usually have an
ingest.shscript to run an ingest, but you can call it whatever you like. - Add any Python requirements to requirements.txt.
- Add any Alpine requirements to Dockerfile.
- Add a values file in the
charts/dug-data-ingest/valuesdirectory, e.g.babel-ingest.yaml. At a minimum this should provide a name for the ingest job and provide the name of the script to be executed bybash.- You may need to modify the
charts/dug-data-ingest/templatesfiles; if so, please make sure you don't break the other ingests!
- You may need to modify the
- Add an
on:pushtrigger to.github/workflows/release-docker-to-renci-containers.yamlto generate a container named after your branch, then use this tag to test your new CronJob, then remove them once you're done.