easypqp-rs

easypqp-rs is a Rust library for in-silico peptide library generation, with Python bindings for integration with the python EasyPQP library.

Features

Fast in-silico library generation using Rust
Includes a command-line tool for batch library generation
Python bindings for integration within the easypqp Python package
Configurable via JSON for fine-tuning predictions, fragmentation settings, and NCE/instrument profiles

Rust Binary CLI Example

easypqp-rs has an optional standalone command-line interface (CLI) binary for generating in-silico libraries. This can be used independently of the EasyPQP Python package if you prefer.

easypqp-insilico ./config.json

Enabling CUDA Support

easypqp-rs and its CLI and Python bindings can optionally use CUDA for GPU acceleration if the underlying redeem-properties dependency is built with the cuda feature. This is controlled via Cargo features and is disabled by default.

To enable CUDA support, build with the cuda feature at the top level. This will propagate the feature through all crates:

Rust CLI

cargo build --release --features cuda

Python (via maturin or pip)

If building the Python package from source, pass the feature to maturin:

maturin build --features cuda
# or for develop mode
maturin develop --features cuda

If using pip, you can pass features with the --config-settings flag:

pip install . --config-settings=--features=cuda

This will enable CUDA support in all relevant dependencies, including redeem-properties.

Docker / Running the CUDA-enabled container

If you built the CUDA-enabled Docker image (the repository Dockerfile builds the binary with the cuda feature), run it on a host with NVIDIA drivers and the NVIDIA Container Toolkit installed. The container must be started with GPU access (for example using Docker's --gpus option).

Example: build the image locally (from repo root):

docker build -t easypqp-insilico:cuda -f Dockerfile .

Run the container and give it access to all GPUs (mount your working directory so easypqp-insilico can read/write data):

docker run --rm --gpus all -v "$(pwd):/data" easypqp-insilico:cuda easypqp-insilico /data/config.json

Notes:

The host must have the NVIDIA Container Toolkit (nvidia-docker) configured so the container can access GPUs. On modern Docker releases you can use the builtin --gpus flag.
You can restrict GPUs with --gpus 'device=0' or use environment variables to select devices if your application honors them.
If you publish the image to a registry, make sure users know that they need NVIDIA drivers and the runtime configured on their host to use the CUDA-enabled image.

Configuration Reference

The tool is configured via a JSON file. Below is a comprehensive guide to all available parameters.

Complete Example Configuration

Click to expand full example config.json

{
  "database": {
    "fasta": "path/to/proteins.fasta",
    "enzyme": {
      "name": "Trypsin/P",
      "cleave_at": "KR",
      "restrict": "P",
      "c_terminal": null,
      "min_len": 7,
      "max_len": 50,
      "missed_cleavages": 2
    },
    "peptide_min_mass": 500.0,
    "peptide_max_mass": 5000.0,
    "generate_decoys": true,
    "decoy_tag": "rev_",
    "static_mods": {
      "C": 57.0215
    },
    "variable_mods": {
      "M": [15.9949],
      "[": [42.0106]
    },
    "max_variable_mods": 2
  },
  "insilico_settings": {
    "precursor_charge": [2, 3, 4],
    "max_fragment_charge": 2,
    "min_transitions": 6,
    "max_transitions": 6,
    "fragmentation_model": "HCD",
    "allowed_fragment_types": ["b", "y"],
    "rt_scale": 100.0
  },
  "dl_feature_generators": {
    "retention_time": {
      "model_path": "path/to/rt_model.safetensors",
      "constants_path": "path/to/rt_model_const.yaml",
      "architecture": "rt_cnn_tf"
    },
    "ion_mobility": {
      "model_path": "path/to/ccs_model.safetensors",
      "constants_path": "path/to/ccs_model_const.yaml",
      "architecture": "ccs_cnn_tf"
    },
    "ms2_intensity": {
      "model_path": "path/to/ms2_model.pth",
      "constants_path": "path/to/ms2_model_const.yaml",
      "architecture": "ms2_bert"
    },
    "device": "cpu",
    "instrument": "timsTOF",
    "nce": 20.0,
    "batch_size": 64,
    "fine_tune_config": {
      "fine_tune": false,
      "train_data_path": "",
      "batch_size": 256,
      "epochs": 3,
      "learning_rate": 0.001,
      "save_model": false
    }
  },
  "peptide_chunking": 0,
  "output_file": "insilico_library.tsv",
  "write_report": true,
  "parquet_output": false
}

Configuration Sections

1. Database Settings (REQUIRED)

database - FASTA file, enzyme, modifications, and decoy generation

Parameter	Type	Default	Description
`fasta`	string	REQUIRED	Path to FASTA protein database file
`generate_decoys`	boolean	`true`	Auto-generate decoy sequences by reversing protein sequences
`decoy_tag`	string	`"rev_"`	Prefix added to decoy protein names
`peptide_min_mass`	number	`500.0`	Minimum peptide mass in Daltons
`peptide_max_mass`	number	`5000.0`	Maximum peptide mass in Daltons
`max_variable_mods`	integer	`2`	Maximum number of variable modifications per peptide

Enzyme Configuration:

"enzyme": {
  "name": "Trypsin/P",          // Enzyme name (for reference)
  "cleave_at": "KR",             // Amino acids where enzyme cleaves
  "restrict": "P",               // Amino acid that prevents cleavage if following cleavage site
  "c_terminal": true,            // Cleavage occurs C-terminal to the cleavage site
  "min_len": 7,                  // Minimum peptide length
  "max_len": 50,                 // Maximum peptide length
  "missed_cleavages": 2          // Number of allowed missed cleavages
}

Static Modifications:

"static_mods": {
  "C": 57.0215    // Carbamidomethylation of Cysteine
}

Variable Modifications:

"variable_mods": {
  "M": [15.9949],    // Oxidation of Methionine
  "[": [42.0106]     // N-terminal Acetylation
}

Common modification masses:

Carbamidomethyl (C): 57.0215
Oxidation (M): 15.9949
Phosphorylation (STY): 79.9663
N-terminal Acetylation: 42.0106
Deamidation (NQ): 0.9840

2. In-Silico Library Settings (REQUIRED)

insilico_settings - Precursor/fragment charges, transitions, fragmentation model

Parameter	Type	Default	Description
`precursor_charge`	array[int]	`[2, 3, 4]`	Precursor charge states to generate
`max_fragment_charge`	integer	`2`	Maximum fragment ion charge
`min_transitions`	integer	`6`	Minimum number of transitions per precursor
`max_transitions`	integer	`6`	Maximum number of transitions per precursor
`fragmentation_model`	string	`"HCD"`	Fragmentation type: `"HCD"`, `"CID"`, or `"ETD"`
`allowed_fragment_types`	array[string]	`["b", "y"]`	Allowed fragment ion types: `"b"`, `"y"`
`rt_scale`	number	`100.0`	Retention time scaling factor (multiplies predicted RT)
`unimod_annotation`	boolean	`true`	Reannotate mass-bracket modifications to UniMod accessions (e.g., `[+57.0215]` → `(UniMod:4)`)
`max_delta_unimod`	number	`0.02`	Maximum delta mass (Da) tolerance for matching to UniMod entries
`enable_unannotated`	boolean	`true`	Keep original mass bracket when no UniMod match is found; if `false`, an error is raised
`unimod_xml_path`	string	`null`	Path to a custom `unimod.xml` file. If omitted, the embedded UniMod database is used

[!NOTE] The current MS2 intensity prediction models only support "b" and "y" fragment ions.

Example:

"insilico_settings": {
  "precursor_charge": [2, 3],
  "max_fragment_charge": 1,
  "min_transitions": 6,
  "max_transitions": 12,
  "fragmentation_model": "HCD",
  "allowed_fragment_types": ["b", "y"],
  "rt_scale": 1.0,
  "unimod_annotation": true,
  "max_delta_unimod": 0.02,
  "enable_unannotated": true
}

[!NOTE] UniMod Reannotation: By default, mass-bracket modification annotations (e.g., [+57.0215]) are converted to UniMod accession notation (e.g., (UniMod:4)). This uses an embedded copy of the UniMod database. To use a custom unimod.xml, set unimod_xml_path to the file path. To disable reannotation entirely, set unimod_annotation to false.

3. Deep Learning Models (OPTIONAL)

Note

If no retention_time, ion_mobility, or ms2_intensity fields are provided under dl_feature_generators, pretrained models will be automatically downloaded and used. The current default pretrained models used are:

RT: rt_cnn_tf - A CNN-Transformer model trained on the ProteomicsML repository RT dataset. This model is based on AlphaPeptDeep's CNN-LSTM implementation, with the LSTM replaced by a Transformer encoder.
CCS: ccs_cnn_tf - A CNN-Transformer model trained on the ProteomicsML repository CCS dataset. This model is also based on AlphaPeptDeep's CNN-LSTM implementation, with the LSTM replaced by a Transformer encoder.
MS2: ms2_bert - A BERT-based model retreived from AlphaPeptDeep's pretrained models.

dl_feature_generators - Custom or pretrained RT/IM/MS2 prediction models

If this section is omitted or empty, pretrained AlphaPeptDeep models will be automatically downloaded and used.

Model Configuration:

Each model (RT, IM, MS2) requires three files:

{
  "model_path": "path/to/model.safetensors",     // Model weights (.pth or .safetensors)
  "constants_path": "path/to/model_const.yaml",  // Model configuration constants
  "architecture": "model_architecture_name"       // Architecture identifier
}

Parameter	Type	Default	Description
`retention_time`	object	pretrained	Custom RT prediction model
`ion_mobility`	object	pretrained	Custom IM/CCS prediction model (timsTOF only)
`ms2_intensity`	object	pretrained	Custom MS2 intensity prediction model
`device`	string	`"cpu"`	Compute device: `"cpu"`, `"cuda"`, or `"mps"` (Apple Silicon)
`instrument`	string	`"timsTOF"`	Instrument type: `"QE"` or `"timsTOF"`
`nce`	number	`20.0`	Normalized collision energy for fragmentation
`batch_size`	integer	`64`	Batch size for model inference
`fine_tune_config`	object	see below	Optional fine-tuning configuration

Supported Architectures:

RT: "rt_cnn_tf", "rt_cnn_lstm"
IM/CCS: "ccs_cnn_tf", "ccs_cnn_lstm"
MS2: "ms2_bert"

4. Fine-Tuning (OPTIONAL)

fine_tune_config - Transfer learning on experimental data

Fine-tune pretrained models on your own experimental data for improved accuracy.

Parameter	Type	Default	Description
`fine_tune`	boolean	`false`	Enable fine-tuning
`train_data_path`	string	`""`	Path to training data TSV file
`batch_size`	integer	`256`	Training batch size
`epochs`	integer	`3`	Number of training epochs
`learning_rate`	number	`0.001`	Learning rate for optimizer
`save_model`	boolean	`false`	Save fine-tuned model weights to disk

Training Data Format (TSV):

Required columns:

sequence: Modified sequence with square bracket notation (e.g., MGC[+57.0215]AAR)
precursor_charge: Precursor charge state
retention_time: Experimental retention time
ion_mobility: CCS value (only if using timsTOF)
fragment_type: Fragment ion type (b, y, etc.)
fragment_series_number: Fragment position
product_charge: Fragment charge
intensity: Normalized fragment intensity

Example:

"fine_tune_config": {
  "fine_tune": true,
  "train_data_path": "experimental_data.tsv",
  "batch_size": 256,
  "epochs": 5,
  "learning_rate": 0.0001,
  "save_model": true
}

5. Output Settings (OPTIONAL)

Output file format, reporting, and memory management

Parameter	Type	Default	Description
`output_file`	string	`"insilico_library.tsv"`	Path for output library file
`write_report`	boolean	`true`	Generate HTML quality control report
`parquet_output`	boolean	`false`	Output in Parquet format instead of TSV
`peptide_chunking`	integer	`0`	Peptides per chunk (0 = auto-calculate based on memory)

Peptide Chunking:

0 (default): Automatically calculate chunk size based on available memory (recommended)
> 0: Manual chunk size for processing large FASTA files with limited RAM
Larger chunks = faster processing but more memory usage

Minimal Configuration

The minimum required configuration only needs a FASTA file:

{
  "database": {
    "fasta": "proteins.fasta"
  }
}

All other parameters will use sensible defaults and pretrained models will be auto-downloaded.

Command-Line Overrides

You can override JSON configuration values via command-line arguments:

easypqp-insilico config.json \
  --fasta my_proteins.fasta \
  --output_file my_library.tsv \
  --no-write-report \
  --parquet

You can also run without a JSON config file by providing only --fasta:

easypqp-insilico --fasta my_proteins.fasta

All other parameters will use sensible defaults.

Available flags:

--fasta <PATH>: Override database FASTA file
--output_file <PATH>: Override output file path
--no-write-report: Disable HTML report generation
--parquet: Output in Parquet format instead of TSV

Decoy Handling

When generate_decoys is enabled, reversed decoy peptides are generated automatically. The decoy_tag (default "DECOY_") is prefixed to each ProteinId, UniprotId, and GeneName for decoy entries, making them easy to distinguish during downstream analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
.github/workflows		.github/workflows
easypqp-cli		easypqp-cli
easypqp-core		easypqp-core
easypqp-py		easypqp-py
test-data		test-data
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

easypqp-rs

Features

Rust Binary CLI Example

Enabling CUDA Support

Rust CLI

Python (via maturin or pip)

Docker / Running the CUDA-enabled container

Configuration Reference

Complete Example Configuration

Configuration Sections

1. Database Settings (REQUIRED)

2. In-Silico Library Settings (REQUIRED)

3. Deep Learning Models (OPTIONAL)

4. Fine-Tuning (OPTIONAL)

5. Output Settings (OPTIONAL)

Minimal Configuration

Command-Line Overrides

Decoy Handling

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

easypqp-rs

Features

Rust Binary CLI Example

Enabling CUDA Support

Rust CLI

Python (via maturin or pip)

Docker / Running the CUDA-enabled container

Configuration Reference

Complete Example Configuration

Configuration Sections

1. Database Settings (REQUIRED)

2. In-Silico Library Settings (REQUIRED)

3. Deep Learning Models (OPTIONAL)

4. Fine-Tuning (OPTIONAL)

5. Output Settings (OPTIONAL)

Minimal Configuration

Command-Line Overrides

Decoy Handling

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages