Conversation
Design for issue #18: unified pgvar|transcript-index|gene format for variant proteins, compatible with all major search engines. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expands the minimal README with key features, installation methods, quick start example, full command reference table, supported variant sources, use case index, and project structure overview. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ⓘ You are approaching your monthly quota for Qodo. Upgrade your plan Review Summary by QodoExpand README with comprehensive documentation and protein accession design
WalkthroughsDescription• Comprehensive README expansion with features, installation, quick start guide • Added detailed command reference tables for all CLI tools and workflows • Documented supported variant sources and 10 end-to-end use cases • Added protein accession design document for unified FASTA header format • Included project structure overview and contribution guidelines Diagramflowchart LR
A["README.md<br/>Minimal content"] -->|"Add features,<br/>installation,<br/>quick start"| B["README.md<br/>Comprehensive guide"]
B -->|"Include command<br/>reference tables"| C["README.md<br/>Full documentation"]
C -->|"Add use cases<br/>and structure"| D["Complete<br/>documentation"]
E["Design doc<br/>Issue #18"] -->|"Protein accession<br/>and FASTA header<br/>format"| F["protein-accession-design.md<br/>Unified variant format"]
File Changes1. README.md
|
Code Review by Qodo
1. Wrong decoy CLI flags
|
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughUpdated README to redefine pgatk as a Python toolkit for building proteogenomics protein sequence databases with features, installation options, and quick start workflow. Added new design documentation specifying protein accession formats and FASTA header conventions with variant prefix strategies and metadata requirements. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| pgatk generate-decoy \ | ||
| --input variant_proteins.fa \ | ||
| --output target_decoy.fa \ |
There was a problem hiding this comment.
1. Wrong decoy cli flags 🐞 Bug ✓ Correctness
README Quick Start uses --input/--output for pgatk generate-decoy, but the CLI only defines --input_database/--output_database (and -in/-out). Following the README will fail with Click “No such option” errors and blocks the Quick Start.
Agent Prompt
## Issue description
README Quick Start documents `pgatk generate-decoy` with `--input` and `--output`, but the CLI only supports `--input_database` / `--output_database` (and `-in` / `-out`). Users will hit a Click error and cannot complete the Quick Start.
## Issue Context
The Click command is defined in `pgatk/commands/proteindb_decoy.py` and does not include `--input`/`--output` aliases.
## Fix Focus Areas
- README.md[63-67]
- pgatk/commands/proteindb_decoy.py[12-17] (optional: if adding aliases)
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| gffread -F -w ensembl_human/transcripts.fa \ | ||
| -g ensembl_human/genome.fa \ | ||
| ensembl_human/Homo_sapiens.GRCh38.*.gtf.gz |
There was a problem hiding this comment.
2. Genome.fa not produced 🐞 Bug ⛯ Reliability
README Quick Start tells users to run gffread with -g ensembl_human/genome.fa, but ensembl-downloader downloads the genome as a versioned *.dna_sm.toplevel.fa.gz file and the codebase does not create a genome.fa convenience file. The Quick Start will fail unless the user manually renames/decompresses, which is not documented.
Agent Prompt
## Issue description
README Quick Start uses `ensembl_human/genome.fa`, but `ensembl-downloader` saves the genome as a versioned `*.dna_sm.toplevel.fa.gz`. Users following the README will not find `genome.fa`.
## Issue Context
Downloader code constructs the genome filename as `{Species}.{Assembly}.dna_sm.toplevel.fa.gz` and downloads it directly.
## Fix Focus Areas
- README.md[51-55]
- pgatk/ensembl/data_downloader.py[474-483] (reference for actual filename pattern)
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
Summary by CodeRabbit