Releases · precimed/genomatch · GitHub

19 May 21:09

v0.4.3 Latest

Latest

Added

Added assign_vmap_ids.py for replacing IDs in an existing .vmap after preparation. IDs are copied from a prepared .vmap or .vtable lookup by exact chrom:pos:a1:a2 match; row order and provenance are preserved. Rows without a non-missing assigned ID are dropped by default, with --unmatched-id-policy available to retain unmatched rows as generated variant-key IDs or . IDs. Duplicate retained non-missing output IDs fail by default; use --duplicate-id-policy to allow them or drop all duplicated-ID rows to QC.

Full Changelog: v0.4.2...v0.4.3

Assets 3

19 May 13:29

v0.4.2

Added

Added prepare_variants.py and prepare_variants_sharded.py --src-build to declare a known source genome build and skip build guessing during preparation.

Full Changelog: v0.4.1...v0.4.2

Assets 3

16 May 14:53

v0.4.1

Added

Added summary-statistics documentation covering metadata, SNP-only imports, projection, and clean mode.
Added --id-lookup for SNP-only summary-statistics preparation. It can fill missing chromosome and position values by matching summary-statistic IDs against a compatible .vmap or .vtable, then continue through the normal preparation workflow.

Fixed

Fixed summary-statistics parsing for files whose header contains tabs but whose data rows use whitespace-separated trailing fields, such as ILAE-style summary-statistics tables.

Full Changelog: v0.4.0...v0.4.1

Assets 3

14 May 21:58

v0.4.0

Added

Added expanded user documentation under docs/, including Tutorial 1, a worked reference-universe projection example with downloadable release fixtures; the top-level README is now a concise entry point.
scripts/normalize_bed_contigs_ncbi.sh for normalizing BED contig labels to canonical NCBI-style primary contigs and reporting dropped noncanonical rows.

Changed

restrict_vmap.py replaces match_vmap_to_target.py as the way to trim a prepared .vmap to a target set.
project_payload.py now projects exactly the variants in --vmap. To project a subset or shared set, build that .vmap first with restrict_vmap.py; use intersect_variants.py or union_variants.py first when you need to define the shared set.
intersect_variants.py and union_variants.py now write source-independent .vtable outputs with chrom:pos:a1:a2 IDs in declared coordinate order; intersect_variants.py inputs are now positional.

Removed

Removed --only-mapped-target from all apply_vmap_* tools. This is now the only apply behavior: applying a .vmap can no longer add genotype or summary-statistics rows that have no corresponding source row.
Removed missing-target sentinel rows from .vmap objects. Unmatched variants are now left out of the .vmap instead of written with source_index=-1 and allele_op=missing.

Fixed

Fixed summary-statistics metadata handling for padded column headers by matching trimmed/case-insensitive names only when the result is unambiguous; also recognizes a wider set of missing-value tokens and allows .gz in path_sumStats.

Performance

Reduced peak memory use in apply_vmap_to_sumstats.py by reading payload columns in chunks, scattering retained rows into .vmap order, and writing output in row blocks. Sumstats projection now emits explicit canonical variant columns, writes tab-delimited output with empty missing fields, emits <output>.meta.yaml, and supports repeated source provenance.
Exact set operations can now choose smaller driver inputs using variants_count metadata while preserving documented output order and ID semantics.

Full Changelog: v0.3.2...v0.4.0

Assets 3