Releases: precimed/genomatch
Releases · precimed/genomatch
v0.4.3
Added
- Added
assign_vmap_ids.pyfor replacing IDs in an existing.vmapafter preparation. IDs are copied from a prepared.vmapor.vtablelookup by exactchrom:pos:a1:a2match; row order and provenance are preserved. Rows without a non-missing assigned ID are dropped by default, with--unmatched-id-policyavailable to retain unmatched rows as generated variant-key IDs or.IDs. Duplicate retained non-missing output IDs fail by default; use--duplicate-id-policyto allow them or drop all duplicated-ID rows to QC.
Full Changelog: v0.4.2...v0.4.3
v0.4.2
Added
- Added
prepare_variants.pyandprepare_variants_sharded.py --src-buildto declare a known source genome build and skip build guessing during preparation.
Full Changelog: v0.4.1...v0.4.2
v0.4.1
Added
- Added summary-statistics documentation covering metadata, SNP-only imports, projection, and clean mode.
- Added
--id-lookupfor SNP-only summary-statistics preparation. It can fill missing chromosome and position values by matching summary-statistic IDs against a compatible.vmapor.vtable, then continue through the normal preparation workflow.
Fixed
- Fixed summary-statistics parsing for files whose header contains tabs but whose data rows use whitespace-separated trailing fields, such as ILAE-style summary-statistics tables.
Full Changelog: v0.4.0...v0.4.1
v0.4.0
Added
- Added expanded user documentation under
docs/, including Tutorial 1, a worked reference-universe projection example with downloadable release fixtures; the top-level README is now a concise entry point. scripts/normalize_bed_contigs_ncbi.shfor normalizing BED contig labels to canonical NCBI-style primary contigs and reporting dropped noncanonical rows.
Changed
restrict_vmap.pyreplacesmatch_vmap_to_target.pyas the way to trim a prepared.vmapto a target set.project_payload.pynow projects exactly the variants in--vmap. To project a subset or shared set, build that.vmapfirst withrestrict_vmap.py; useintersect_variants.pyorunion_variants.pyfirst when you need to define the shared set.intersect_variants.pyandunion_variants.pynow write source-independent.vtableoutputs withchrom:pos:a1:a2IDs in declared coordinate order;intersect_variants.pyinputs are now positional.
Removed
- Removed
--only-mapped-targetfrom allapply_vmap_*tools. This is now the only apply behavior: applying a.vmapcan no longer add genotype or summary-statistics rows that have no corresponding source row. - Removed missing-target sentinel rows from
.vmapobjects. Unmatched variants are now left out of the.vmapinstead of written withsource_index=-1andallele_op=missing.
Fixed
- Fixed summary-statistics metadata handling for padded column headers by matching trimmed/case-insensitive names only when the result is unambiguous; also recognizes a wider set of missing-value tokens and allows
.gzinpath_sumStats.
Performance
- Reduced peak memory use in
apply_vmap_to_sumstats.pyby reading payload columns in chunks, scattering retained rows into.vmaporder, and writing output in row blocks. Sumstats projection now emits explicit canonical variant columns, writes tab-delimited output with empty missing fields, emits<output>.meta.yaml, and supports repeated source provenance. - Exact set operations can now choose smaller driver inputs using
variants_countmetadata while preserving documented output order and ID semantics.
Full Changelog: v0.3.2...v0.4.0