Conversation
- Add VariantTransformSettings / StartCodonConvention for post-mapping
protein notation control: Variant.transform(settings) converts
p.(Met1Val) -> p.Met1? under HgvsQuestion convention while keeping
specific predictions as the internal default (option b architecture)
- Fix p.Met1? ≡ p.Met1Xaa equivalence: AaEdit::Special{"?"} now
projects to ResidueToken::Any in analogous_edit.rs, and normalize_format
maps '?' -> 'X' (same as Xaa) so both notations compare equal
- Add mapper.p_to_c for back-converting protein substitutions to cDNA
Code quality improvements:
Remove the broken Variant.to_spdi entrypoint (wrong position for c./n.
variants due to CDS-relative vs transcript-relative mismatch; no strand
complementation). VariantMapper.to_spdi is the sole correct entrypoint.
Fix unsafe i32-as-usize casts in get_c_indices and p_to_c that silently
wrapped on negative TranscriptPos/ProteinPos values, producing garbage
sequence indices.
Fix normalize_variant Coding arm: direct arithmetic on HgvsTranscriptPos
produced the invalid position c.0 when a 3'-shift crossed the 5'UTR/CDS
boundary. Positions are now re-derived via TranscriptMapper::n_to_c.
Refactor: extract n_to_c_position() helper to deduplicate the repeated
pattern of converting a 0-based n. index to a BaseOffsetPosition. Remove
get_n_indices() (identical to get_c_indices()). Replace redundant
get_transcript() calls in the shift and Ins->Dup blocks with dyn_clone.
- transcript_mapper: sort exons in transcript order per strand
build CigarMapper per exon and consult it in g_to_n/n_to_g
separate best_dist/best_offset in intronic search
- mapper: add checked_usize helper and apply throughout);
implement cyclic-rotation comparison for multi-base ins shift;
g_to_c_all now propagates errors when all mappings fail
- altseq: use max (not min) for Repeat alt-seq generation
- altseq_to_hgvsp: guard N-terminal insertion (start_idx=0)
- cigar: fix CigarMapper ref/tgt advancement ops (D advances ref,
I advances tgt); update test expectations
- data: clarify c_to_g doc comment
refactor: eliminate 10 DRY violations across hgvs-weaver
Extract repeated logic into focused helpers:
- utils: translate_cds delegates to translate_codon; aa3_chunk_to_residue
extracted from decompose_aa
- analogous_edit: push_dna_tokens / push_aa_tokens replace 7 inline
token-push patterns
- equivalence: strand_aware_edit free fn (4 sites); normalize_ins_to_dup_boi
collapses identical Coding/NonCoding arms
- mapper: extract_edit_sequences (shift_3_prime + shift_5_prime);
apply_strand_complement (6 sites); make_base_offset_position +
make_simple_position; ins_anchor_and_end (2 sites);
update_del_dup_ref method (3 sites)
refactor: split normalize_variant into per-variant-type methods
Extract normalize_coding_variant, normalize_genomic_variant, and
normalize_noncoding_variant from the monolithic normalize_variant, which
is now a clean 10-line dispatch. Each arm can now be read and tested in
isolation without constructing the full SequenceVariant enum.
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive set of improvements to the hgvs-weaver library, significantly enhancing its functionality, correctness, and maintainability. The changes include the introduction of configurable protein variant transformations, robust handling of start-codon and intronic variants, and a new protein-to-cDNA back-conversion feature.
Key bug fixes have been implemented, such as correcting the CIGAR mapping logic for reference/target advancement and ensuring type safety by replacing raw integer strand indicators with a Strand enum. The normalization logic has been substantially refactored for clarity and correctness, particularly in handling 3' shifts across transcript boundaries.
The codebase has been improved through extensive refactoring to reduce duplication and improve type safety. The Python API has also been enhanced with a better exception hierarchy and cleaner type stubs.
The addition of new integration tests and a large-scale validation script demonstrates a strong commitment to quality and correctness.
Overall, this is an excellent and well-executed pull request. The changes are thorough, well-tested, and significantly improve the quality of the codebase. I have reviewed the changes in detail and have not found any issues of medium or higher severity.
This PR introduces comprehensive improvements to variant handling (especially protein variants), refactors core mapping logic for better type safety, and addresses
several critical bugs in normalization. It also adds infrastructure for large-scale validation and architectural tracking.
Key Features
Bug Fixes
Refactoring and Quality
Tools and Infrastructure