Skip to content

Refactor/rename remove improve the docs#93

Merged
ypriverol merged 4 commits into
devfrom
refactor/rename
Mar 3, 2026
Merged

Refactor/rename remove improve the docs#93
ypriverol merged 4 commits into
devfrom
refactor/rename

Conversation

@ypriverol
Copy link
Copy Markdown
Member

No description provided.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 3, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/rename

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Copy Markdown

ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan

Review Summary by Qodo

Refactor mutation parsing, improve VCF handling, and expand documentation

✨ Enhancement 🐞 Bug fix 📝 Documentation

Grey Divider

Walkthroughs

Description
• Improve HGVS amino acid mutation parsing with 3-letter to 1-letter conversion
• Fix VCF annotation parsing and CDS extraction with robust string handling
• Enhance VCF-to-protein translation with better variant key matching
• Add comprehensive documentation and use case examples for all tools
Diagram
flowchart LR
  A["HGVS Parsing"] -->|"3-letter to 1-letter conversion"| B["Improved Mutation Handling"]
  C["VCF Processing"] -->|"Robust CDS extraction"| D["Better Variant Matching"]
  E["Documentation"] -->|"13 use case workflows"| F["User Guides"]
  B --> G["Enhanced Protein DB Generation"]
  D --> G
Loading

Grey Divider

File Changes

1. pgatk/cgenomes/cgenomes_proteindb.py ✨ Enhancement +12/-4

Improve HGVS amino acid mutation parsing logic

pgatk/cgenomes/cgenomes_proteindb.py


2. pgatk/clinvar/clinvar_service.py 🐞 Bug fix +2/-2

Fix CDS string extraction with bracket stripping

pgatk/clinvar/clinvar_service.py


3. pgatk/ensembl/ensembl.py ✨ Enhancement +34/-16

Refactor VCF annotation parsing and variant key matching

pgatk/ensembl/ensembl.py


View more (6)
4. pgatk/toolbox/vcf_utils.py 🐞 Bug fix +7/-1

Fix multi-base variant handling across exon boundaries

pgatk/toolbox/vcf_utils.py


5. docs/index.md 📝 Documentation +1/-0

Add use cases documentation link to index

docs/index.md


6. docs/pgatk-cli.md 📝 Documentation +163/-0

Add documentation for new CLI commands and workflows

docs/pgatk-cli.md


7. docs/use-cases.md 📝 Documentation +1078/-0

Create comprehensive 13-workflow proteogenomics guide

docs/use-cases.md


8. mkdocs.yml 📝 Documentation +1/-0

Add use cases page to documentation navigation

mkdocs.yml


9. pgatk/testdata/proteindb_from_gnomad_VCF.fa Miscellaneous +0/-2

Remove invalid test data sequence entry

pgatk/testdata/proteindb_from_gnomad_VCF.fa


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Mar 3, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Remediation recommended

1. Exon-boundary MNP risk 🐞 Bug ✓ Correctness
Description
In get_altseq(), the REF-side clipping for variants that extend beyond an exon is applied only to
how many bases are removed from the transcript, while the full ALT allele is still inserted. For
boundary-spanning MNPs (len(REF)==len(ALT)>exonic portion), this can introduce non-exonic bases into
the mutated transcript and yield incorrect translated proteins.
Code

pgatk/toolbox/vcf_utils.py[R112-120]

+            # Clip the ref allele length to the portion that falls within
+            # this feature.  When a multi-base variant extends from an exon
+            # into an intron, the intronic bases are absent from the
+            # transcript and must not be counted.
+            var_end_genomic = var_pos + len(ref_allele) - 1
+            exonic_ref_len = min(var_end_genomic, feature[1]) - var_pos + 1
+            c = max(exonic_ref_len, 0)
            alt_seq = ref_seq[0:var_index_in_cds] + var_allele + ref_seq[var_index_in_cds + c::]
            if strand == '-':
Evidence
The pipeline explicitly accepts partially overlapping variants (start/end computed from len(REF))
and then calls get_altseq() to apply them. get_altseq() now reduces the number of reference bases
removed (c) to the exonic portion, but still inserts the full var_allele. When a variant spans
beyond the exon end, c < len(ref_allele). For MNPs where len(var_allele)==len(ref_allele), the
inserted ALT will be longer than the removed exonic portion, implying extra (non-feature) bases get
introduced into the transcript sequence.

pgatk/toolbox/vcf_utils.py[16-43]
pgatk/clinvar/clinvar_service.py[519-533]
pgatk/ensembl/ensembl.py[639-654]
pgatk/toolbox/vcf_utils.py[109-123]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`pgatk.toolbox.vcf_utils.get_altseq()` clips how many reference bases are removed when a multi-base variant extends beyond an exon boundary, but it still inserts the full ALT allele. For MNPs (len(REF)==len(ALT)) spanning an exon boundary, this can inject non-exonic bases into the transcript sequence and lead to incorrect protein translations.

## Issue Context
This function is used by both Ensembl and ClinVar pipelines after they accept partially overlapping variants via `check_overlap(var_start, var_end)` where `var_end` is computed using `len(REF)`.

## Fix Focus Areas
- pgatk/toolbox/vcf_utils.py[109-123]
- pgatk/tests/test_vcf_utils.py[66-113]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

@ypriverol ypriverol merged commit b4374ad into dev Mar 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant