Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
b670242
docs: rename --var_prefix to --protein_prefix in CLI examples
ypriverol May 11, 2026
bb4edb6
Stop tracking docs/plans/ (internal working docs)
ypriverol May 11, 2026
14ee5e7
Switch COSMIC downloader to v103 scripted-download API
ypriverol May 11, 2026
e485df4
Update default include_biotypes to current Ensembl annotations
ypriverol May 11, 2026
03de0b6
Speed up vcf-to-proteindb (~475x on the dominant SQL op)
ypriverol May 11, 2026
9eba32c
Address Codacy findings: tarfile path-traversal + minor style
ypriverol May 11, 2026
2b467a5
Configure pydocstyle to suppress D211/D213 (conflict with codebase st…
ypriverol May 11, 2026
f781b00
Silence remaining Codacy noise (tarfile + pydocstyle config)
ypriverol May 11, 2026
dd86c39
Fix remaining real Codacy findings; defer style-conflict ones to UI
ypriverol May 11, 2026
e6889e4
Bump version to 0.0.27
ypriverol May 11, 2026
ab10aa6
Clean up unnecessary comments added in this PR
ypriverol May 11, 2026
3ede3ba
Remove unused comments across the codebase
ypriverol May 11, 2026
237f890
Address Copilot review feedback on PR #102
ypriverol May 11, 2026
c1ea9cc
Update use cases and configuration for Ensembl biotypes and file formats
husensofteng May 12, 2026
a462790
Updated use cases documentation - Renamed use case headings for bette…
husensofteng May 12, 2026
2178ca0
Enhance mutation handling in CancerGenomesService based on COSMIC upd…
husensofteng May 13, 2026
311fafb
minor command fixes in use-cases file
husensofteng May 13, 2026
dd286f8
Update COSMIC integration for v103 schema changes
husensofteng May 13, 2026
0b9c179
minor fix to cosmic gzip file reading
husensofteng May 13, 2026
7c08c50
Add cProfile support to vcf-to-proteindb benchmark script
ypriverol May 13, 2026
9235f7d
Parallelise vcf-to-proteindb per chromosome via multiprocessing
ypriverol May 13, 2026
254e35c
Tier-1 perf follow-ups: SeqIO.index_db, streamed VCF split, lazy id-m…
ypriverol May 13, 2026
d50f8a0
Revert non-deterministic test output; gitignore SeqIO.index_db files
ypriverol May 13, 2026
6c02214
Merge pull request #104 from bigbio/perf/vcf-to-proteindb-parallel
ypriverol May 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions .codacy.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
# Codacy configuration. https://docs.codacy.com/repositories-configure/codacy-configuration-file/
#
# pydocstyle has two pairs of mutually-exclusive rules:
# D203 vs D211 (blank line before class docstring; codebase follows D203)
# D212 vs D213 (multi-line summary line; codebase follows D212)
# Both rules in each pair are enabled in Codacy's default profile, which produces
# unavoidable noise — pick one and silence the other. Settings here also apply to
# any local pydocstyle invocation that reads `.pydocstyle`.

engines:
pydocstyle:
enabled: true
settings:
add_ignore: ["D211", "D213"]
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -135,3 +135,11 @@ pgatk/testdata/Meleagris_gallopavo*
.DS_Store
.codacy/
.cursor/

# Internal working docs (implementation plans, scratch notes)
docs/plans/

# BioPython SeqIO.index_db SQLite indexes — built lazily on first use,
# rebuilt automatically when the source FASTA changes (mtime check).
*.fa.idx
*.fasta.idx
6 changes: 6 additions & 0 deletions .pydocstyle
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[pydocstyle]
# D211 conflicts with D203 (codebase follows D203: one blank line before class docstring).
# D213 conflicts with D212 (codebase follows D212: multi-line summary on the first line).
# Disabling the rules we don't follow stops the mutually-exclusive-pair noise from
# static analysers (pydocstyle, Codacy).
add-ignore = D211,D213
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ See the [Installation](installation.md) page for more options (Bioconda, Docker,
| [Introduction](introduction.md) | Overview of the proteogenomics field |
| [Installation](installation.md) | How to install pgatk (pip, Bioconda, Docker, source) |
| [pgatk CLI](pgatk-cli.md) | Full command-line reference for all tools |
| [Validations](validations.md) | Tests and validations to ensure correctness of the modules|
| [Use Cases](use-cases.md) | End-to-end workflows and recipes for common scenarios |
| [File Formats](formats.md) | BED, GTF, GCT format specifications |
| [Changelog](changelog.md) | Version history and release notes |
Expand Down
10 changes: 5 additions & 5 deletions docs/pgatk-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ Usage: pgatk vcf-to-proteindb [OPTIONS]
Options:
--translation_table INTEGER Translation table (Default 1)
--mito_translation_table INTEGER Mito_trans_table (default 2)
--var_prefix TEXT String to add as prefix for the variant peptides
--protein_prefix TEXT String to add as prefix for the variant peptides
--report_ref_seq Also report the reference peptide from overlapping transcripts
--annotation_field_name TEXT Annotation field name in INFO column (default: CSQ)
--af_field TEXT Field name for variant allele frequency (default: none)
Expand Down Expand Up @@ -467,7 +467,7 @@ Usage: pgatk dnaseq-to-proteindb [OPTIONS]
--biotype_str TEXT String used to identify gene/transcript biotype (default: transcript_biotype)
--expression_str TEXT String for extracting expression value (default: None)
--expression_thresh FLOAT Threshold for expression value filtering (default: 5)
--var_prefix TEXT Prefix to be added to fasta headers (default: none)
--protein_prefix TEXT Prefix to be added to fasta headers (default: none)
-h, --help Show this message and exit.
```

Expand All @@ -489,7 +489,7 @@ Usage: pgatk dnaseq-to-proteindb [OPTIONS]
--config_file config/ensembl_config.yaml \
--input_fasta transcript_sequences.fa \
--output_proteindb proteindb_from_lincRNA_canonical_sequences.fa \
--var_prefix lincRNA_ \
--protein_prefix lincRNA_ \
--include_biotypes lincRNA
```

Expand All @@ -500,7 +500,7 @@ Usage: pgatk dnaseq-to-proteindb [OPTIONS]
--config_file config/ensembl_config.yaml \
--input_fasta transcript_sequences.fa \
--output_proteindb proteindb_from_processed_pseudogene.fa \
--var_prefix pseudogene_ \
--protein_prefix pseudogene_ \
--include_biotypes processed_pseudogene,transcribed_processed_pseudogene,translated_processed_pseudogene \
--skip_including_all_cds
```
Expand All @@ -512,7 +512,7 @@ Usage: pgatk dnaseq-to-proteindb [OPTIONS]
--config_file config/ensembl_config.yaml \
--input_fasta transcript_sequences.fa \
--output_proteindb proteindb_from_altORFs.fa \
--var_prefix altorf_ \
--protein_prefix altorf_ \
--include_biotypes altORFs \
--skip_including_all_cds
```
Expand Down
270 changes: 0 additions & 270 deletions docs/plans/2026-03-01-pgatk-graph-engine-design.md

This file was deleted.

Loading
Loading