feat(website): python seeder with pango lineages and test suite#1203
Open
fhennig wants to merge 13 commits into
Open
feat(website): python seeder with pango lineages and test suite#1203fhennig wants to merge 13 commits into
fhennig wants to merge 13 commits into
Conversation
Adds example-data/lineages/seed.py, a Python script that fetches pango lineage definitions from the upstream summary JSON and creates one backend collection per lineage (nucleotide substitutions as variants). Mirrors the patterns of seed.mjs: idempotent, supports --wait, --url, --user-id, and --limit (default 10 for testing, 0 for all). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…seeder Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces the split JS/Python approach with a single Python codebase: - seed.py: main entry point with argparse subcommands (covid-resistance-mutations, covid-pango-lineages) - backend.py: shared BackendClient (wait, fetch, create) - sources/resistance_mutations.py: port of seed.mjs resistance data - sources/pango_lineages.py: pango lineage fetcher - Dockerfile updated to run python3 seed.py Running without a subcommand seeds all sources. --limit only applies to the covid-pango-lineages subcommand (default: 10, 0 = all). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- pixi.toml with [workspace] config, python 3.13, requests via PyPI - pixi.lock committed for reproducibility - Dockerfile updated to multi-stage: pixi builder copies site-packages into python:3.13-slim final image - Defines tasks: seed, seed-lineages, seed-all-lineages, seed-resistance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BackendClient now calls POST /users/sync (githubId=9999999999, name="GenSpectrum Team") to obtain the internal user id before any collection API calls. wait_for_backend() uses this call for polling. Removes the --user-id CLI flag. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Collections are now always created or updated (matched by name).
Adds BackendClient.update_collection() using PUT /collections/{id}.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ackend 38 tests across 4 files: - test_backend.py: BackendClient (responses library for HTTP mocking) - test_resistance_mutations.py: mature_name offset math, collection structure - test_pango_lineages.py: collection building, variant filtering, HTTP fetch - test_seed.py: seed_source create/update/mixed upsert logic Run with: pixi run -e test test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…t, FilterObject, ExistingCollection)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dependencies
... Also, I still didn't review this yet myself (Felix) - It's just vibe coded so far.
Summary
seed.mjswith a unified Python seeder incollection-seeding/(renamed fromexample-data/)sources/covid-resistance-mutations— port of the original JS resistance mutation data (3CLpro, RdRp, Spike mAb)covid-pango-lineages— fetches ~4,976 pango lineage definitions from corneliusroemer/pango-sequences, one collection per lineage with nucleotide substitutions as variantscovid-pango-lineages,covid-resistance-mutations); no subcommand runs all sourcesPOST /collectionsandPUT /collections/{id}POST /users/syncbefore any backend interaction to obtain the internal user ID (using the genspectrum-bot account, GitHub ID218605180)python:3.13-slim)TypedDict(Collection,Variant,FilterObject,ExistingCollection)Test plan
pixi run -e test test— all 38 tests passpixi run seed— seeds resistance mutations + first 10 lineages against a local backendpixi run seedagain — all collections updated (upsert)pixi run seed-all-lineages— seeds all ~4,976 lineagesdocker build -t collection-seeder . && docker run --rm -e BACKEND_URL=http://host.docker.internal:8080 collection-seeder🤖 Generated with Claude Code