Cornelius maintains a list of variants and explicit mutations that with the variant in this repo: https://github.com/corneliusroemer/pango-sequences/
raw file: https://raw.githubusercontent.com/corneliusroemer/pango-sequences/refs/heads/main/data/pango-consensus-sequences_summary.json
Currently, we 'calculate' the mutation signature of a variant:
For various reasons, this is not ideal.
We want users to be able to instead pick the curated lists of mutations to look at. We would still allow users the 'old' way of statistically computing the variant signature.
Implementation
Authentication
This depends on #1200 (API key auth). A dedicated bot user will be created; its API key stored as a GitHub Secret and passed as API_KEY env var to the container.
Approach: containerized ingest scripts
Ingest scripts live in this repository (under ingest/) as one or more containers. Starting point is the existing example-data/seed-hello-world.mjs, which already calls the collections API — it just needs to swap session-cookie auth for Authorization: Bearer $API_KEY.
A single container with subcommands (e.g. node ingest.mjs pango-sequences) is the default, but separate containers per source are also fine if different runtimes are needed (e.g. Python for one source, JS for another).
Overwrite strategy: on each run, the script overwrites the target collection in full. No diffing needed.
Pango-sequences script
Reads Cornelius' variant definitions and upserts a collection via the API.
Running the container
Staging (on DB reset): added as a short-lived service in docker-compose (e.g. restart: no), runs automatically when staging is reset.
Production (regular updates): GitHub Actions runs the container on a schedule, passing API_KEY from GitHub Secrets and the prod API URL.
Cornelius maintains a list of variants and explicit mutations that with the variant in this repo: https://github.com/corneliusroemer/pango-sequences/
raw file: https://raw.githubusercontent.com/corneliusroemer/pango-sequences/refs/heads/main/data/pango-consensus-sequences_summary.json
Currently, we 'calculate' the mutation signature of a variant:
For various reasons, this is not ideal.
We want users to be able to instead pick the curated lists of mutations to look at. We would still allow users the 'old' way of statistically computing the variant signature.
Implementation
Authentication
This depends on #1200 (API key auth). A dedicated bot user will be created; its API key stored as a GitHub Secret and passed as
API_KEYenv var to the container.Approach: containerized ingest scripts
Ingest scripts live in this repository (under
ingest/) as one or more containers. Starting point is the existingexample-data/seed-hello-world.mjs, which already calls the collections API — it just needs to swap session-cookie auth forAuthorization: Bearer $API_KEY.A single container with subcommands (e.g.
node ingest.mjs pango-sequences) is the default, but separate containers per source are also fine if different runtimes are needed (e.g. Python for one source, JS for another).Overwrite strategy: on each run, the script overwrites the target collection in full. No diffing needed.
Pango-sequences script
Reads Cornelius' variant definitions and upserts a collection via the API.
Running the container
Staging (on DB reset): added as a short-lived service in
docker-compose(e.g.restart: no), runs automatically when staging is reset.Production (regular updates): GitHub Actions runs the container on a schedule, passing
API_KEYfrom GitHub Secrets and the prod API URL.