Skip to content

wastewater: allow using Cornelius' definitions of variants #1199

@fhennig

Description

@fhennig

Cornelius maintains a list of variants and explicit mutations that with the variant in this repo: https://github.com/corneliusroemer/pango-sequences/

raw file: https://raw.githubusercontent.com/corneliusroemer/pango-sequences/refs/heads/main/data/pango-consensus-sequences_summary.json

Currently, we 'calculate' the mutation signature of a variant:

Image

For various reasons, this is not ideal.

We want users to be able to instead pick the curated lists of mutations to look at. We would still allow users the 'old' way of statistically computing the variant signature.

Implementation

Authentication

This depends on #1200 (API key auth). A dedicated bot user will be created; its API key stored as a GitHub Secret and passed as API_KEY env var to the container.

Approach: containerized ingest scripts

Ingest scripts live in this repository (under ingest/) as one or more containers. Starting point is the existing example-data/seed-hello-world.mjs, which already calls the collections API — it just needs to swap session-cookie auth for Authorization: Bearer $API_KEY.

A single container with subcommands (e.g. node ingest.mjs pango-sequences) is the default, but separate containers per source are also fine if different runtimes are needed (e.g. Python for one source, JS for another).

Overwrite strategy: on each run, the script overwrites the target collection in full. No diffing needed.

Pango-sequences script

Reads Cornelius' variant definitions and upserts a collection via the API.

Running the container

Staging (on DB reset): added as a short-lived service in docker-compose (e.g. restart: no), runs automatically when staging is reset.

Production (regular updates): GitHub Actions runs the container on a schedule, passing API_KEY from GitHub Secrets and the prod API URL.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions