easypqp-rs is a Rust library for in-silico peptide library generation, with Python bindings for integration with the python EasyPQP library.
-
Fast in-silico library generation using Rust
-
Includes a command-line tool for batch library generation
-
Python bindings for integration within the easypqp Python package
-
Configurable via JSON for fine-tuning predictions, fragmentation settings, and NCE/instrument profiles
easypqp-rs has an optional standalone command-line interface (CLI) binary for generating in-silico libraries. This can be used independently of the EasyPQP Python package if you prefer.
easypqp-insilico ./config.jsoneasypqp-rs and its CLI and Python bindings can optionally use CUDA for GPU acceleration if the underlying redeem-properties dependency is built with the cuda feature. This is controlled via Cargo features and is disabled by default.
To enable CUDA support, build with the cuda feature at the top level. This will propagate the feature through all crates:
cargo build --release --features cudaIf building the Python package from source, pass the feature to maturin:
maturin build --features cuda
# or for develop mode
maturin develop --features cudaIf using pip, you can pass features with the --config-settings flag:
pip install . --config-settings=--features=cudaThis will enable CUDA support in all relevant dependencies, including redeem-properties.
If you built the CUDA-enabled Docker image (the repository Dockerfile builds the binary with the cuda feature), run it on a host with NVIDIA drivers and the NVIDIA Container Toolkit installed. The container must be started with GPU access (for example using Docker's --gpus option).
Example: build the image locally (from repo root):
docker build -t easypqp-insilico:cuda -f Dockerfile .Run the container and give it access to all GPUs (mount your working directory so easypqp-insilico can read/write data):
docker run --rm --gpus all -v "$(pwd):/data" easypqp-insilico:cuda easypqp-insilico /data/config.jsonNotes:
- The host must have the NVIDIA Container Toolkit (nvidia-docker) configured so the container can access GPUs. On modern Docker releases you can use the builtin
--gpusflag. - You can restrict GPUs with
--gpus 'device=0'or use environment variables to select devices if your application honors them. - If you publish the image to a registry, make sure users know that they need NVIDIA drivers and the runtime configured on their host to use the CUDA-enabled image.
The tool is configured via a JSON file. Below is a comprehensive guide to all available parameters.
Click to expand full example config.json
{
"database": {
"fasta": "path/to/proteins.fasta",
"enzyme": {
"name": "Trypsin/P",
"cleave_at": "KR",
"restrict": "P",
"c_terminal": null,
"min_len": 7,
"max_len": 50,
"missed_cleavages": 2
},
"peptide_min_mass": 500.0,
"peptide_max_mass": 5000.0,
"generate_decoys": true,
"decoy_tag": "rev_",
"static_mods": {
"C": 57.0215
},
"variable_mods": {
"M": [15.9949],
"[": [42.0106]
},
"max_variable_mods": 2
},
"insilico_settings": {
"precursor_charge": [2, 3, 4],
"max_fragment_charge": 2,
"min_transitions": 6,
"max_transitions": 6,
"fragmentation_model": "HCD",
"allowed_fragment_types": ["b", "y"],
"rt_scale": 100.0
},
"dl_feature_generators": {
"retention_time": {
"model_path": "path/to/rt_model.safetensors",
"constants_path": "path/to/rt_model_const.yaml",
"architecture": "rt_cnn_tf"
},
"ion_mobility": {
"model_path": "path/to/ccs_model.safetensors",
"constants_path": "path/to/ccs_model_const.yaml",
"architecture": "ccs_cnn_tf"
},
"ms2_intensity": {
"model_path": "path/to/ms2_model.pth",
"constants_path": "path/to/ms2_model_const.yaml",
"architecture": "ms2_bert"
},
"device": "cpu",
"instrument": "timsTOF",
"nce": 20.0,
"batch_size": 64,
"fine_tune_config": {
"fine_tune": false,
"train_data_path": "",
"batch_size": 256,
"epochs": 3,
"learning_rate": 0.001,
"save_model": false
}
},
"peptide_chunking": 0,
"output_file": "insilico_library.tsv",
"write_report": true,
"parquet_output": false
}database - FASTA file, enzyme, modifications, and decoy generation
| Parameter | Type | Default | Description |
|---|---|---|---|
fasta |
string | REQUIRED | Path to FASTA protein database file |
generate_decoys |
boolean | true |
Auto-generate decoy sequences by reversing protein sequences |
decoy_tag |
string | "rev_" |
Prefix added to decoy protein names |
peptide_min_mass |
number | 500.0 |
Minimum peptide mass in Daltons |
peptide_max_mass |
number | 5000.0 |
Maximum peptide mass in Daltons |
max_variable_mods |
integer | 2 |
Maximum number of variable modifications per peptide |
Enzyme Configuration:
"enzyme": {
"name": "Trypsin/P", // Enzyme name (for reference)
"cleave_at": "KR", // Amino acids where enzyme cleaves
"restrict": "P", // Amino acid that prevents cleavage if following cleavage site
"c_terminal": true, // Cleavage occurs C-terminal to the cleavage site
"min_len": 7, // Minimum peptide length
"max_len": 50, // Maximum peptide length
"missed_cleavages": 2 // Number of allowed missed cleavages
}Static Modifications:
"static_mods": {
"C": 57.0215 // Carbamidomethylation of Cysteine
}Variable Modifications:
"variable_mods": {
"M": [15.9949], // Oxidation of Methionine
"[": [42.0106] // N-terminal Acetylation
}Common modification masses:
- Carbamidomethyl (C):
57.0215 - Oxidation (M):
15.9949 - Phosphorylation (STY):
79.9663 - N-terminal Acetylation:
42.0106 - Deamidation (NQ):
0.9840
insilico_settings - Precursor/fragment charges, transitions, fragmentation model
| Parameter | Type | Default | Description |
|---|---|---|---|
precursor_charge |
array[int] | [2, 3, 4] |
Precursor charge states to generate |
max_fragment_charge |
integer | 2 |
Maximum fragment ion charge |
min_transitions |
integer | 6 |
Minimum number of transitions per precursor |
max_transitions |
integer | 6 |
Maximum number of transitions per precursor |
fragmentation_model |
string | "HCD" |
Fragmentation type: "HCD", "CID", or "ETD" |
allowed_fragment_types |
array[string] | ["b", "y"] |
Allowed fragment ion types: "b", "y" |
rt_scale |
number | 100.0 |
Retention time scaling factor (multiplies predicted RT) |
unimod_annotation |
boolean | true |
Reannotate mass-bracket modifications to UniMod accessions (e.g., [+57.0215] → (UniMod:4)) |
max_delta_unimod |
number | 0.02 |
Maximum delta mass (Da) tolerance for matching to UniMod entries |
enable_unannotated |
boolean | true |
Keep original mass bracket when no UniMod match is found; if false, an error is raised |
unimod_xml_path |
string | null |
Path to a custom unimod.xml file. If omitted, the embedded UniMod database is used |
[!NOTE] The current MS2 intensity prediction models only support
"b"and"y"fragment ions.
Example:
"insilico_settings": {
"precursor_charge": [2, 3],
"max_fragment_charge": 1,
"min_transitions": 6,
"max_transitions": 12,
"fragmentation_model": "HCD",
"allowed_fragment_types": ["b", "y"],
"rt_scale": 1.0,
"unimod_annotation": true,
"max_delta_unimod": 0.02,
"enable_unannotated": true
}[!NOTE] UniMod Reannotation: By default, mass-bracket modification annotations (e.g.,
[+57.0215]) are converted to UniMod accession notation (e.g.,(UniMod:4)). This uses an embedded copy of the UniMod database. To use a customunimod.xml, setunimod_xml_pathto the file path. To disable reannotation entirely, setunimod_annotationtofalse.
Note
If no retention_time, ion_mobility, or ms2_intensity fields are provided under dl_feature_generators, pretrained models will be automatically downloaded and used. The current default pretrained models used are:
- RT:
rt_cnn_tf- A CNN-Transformer model trained on the ProteomicsML repository RT dataset. This model is based on AlphaPeptDeep's CNN-LSTM implementation, with the LSTM replaced by a Transformer encoder. - CCS:
ccs_cnn_tf- A CNN-Transformer model trained on the ProteomicsML repository CCS dataset. This model is also based on AlphaPeptDeep's CNN-LSTM implementation, with the LSTM replaced by a Transformer encoder. - MS2:
ms2_bert- A BERT-based model retreived from AlphaPeptDeep's pretrained models.
dl_feature_generators - Custom or pretrained RT/IM/MS2 prediction models
If this section is omitted or empty, pretrained AlphaPeptDeep models will be automatically downloaded and used.
Model Configuration:
Each model (RT, IM, MS2) requires three files:
{
"model_path": "path/to/model.safetensors", // Model weights (.pth or .safetensors)
"constants_path": "path/to/model_const.yaml", // Model configuration constants
"architecture": "model_architecture_name" // Architecture identifier
}| Parameter | Type | Default | Description |
|---|---|---|---|
retention_time |
object | pretrained | Custom RT prediction model |
ion_mobility |
object | pretrained | Custom IM/CCS prediction model (timsTOF only) |
ms2_intensity |
object | pretrained | Custom MS2 intensity prediction model |
device |
string | "cpu" |
Compute device: "cpu", "cuda", or "mps" (Apple Silicon) |
instrument |
string | "timsTOF" |
Instrument type: "QE" or "timsTOF" |
nce |
number | 20.0 |
Normalized collision energy for fragmentation |
batch_size |
integer | 64 |
Batch size for model inference |
fine_tune_config |
object | see below | Optional fine-tuning configuration |
Supported Architectures:
- RT:
"rt_cnn_tf","rt_cnn_lstm" - IM/CCS:
"ccs_cnn_tf","ccs_cnn_lstm" - MS2:
"ms2_bert"
fine_tune_config - Transfer learning on experimental data
Fine-tune pretrained models on your own experimental data for improved accuracy.
| Parameter | Type | Default | Description |
|---|---|---|---|
fine_tune |
boolean | false |
Enable fine-tuning |
train_data_path |
string | "" |
Path to training data TSV file |
batch_size |
integer | 256 |
Training batch size |
epochs |
integer | 3 |
Number of training epochs |
learning_rate |
number | 0.001 |
Learning rate for optimizer |
save_model |
boolean | false |
Save fine-tuned model weights to disk |
Training Data Format (TSV):
Required columns:
sequence: Modified sequence with square bracket notation (e.g.,MGC[+57.0215]AAR)precursor_charge: Precursor charge stateretention_time: Experimental retention timeion_mobility: CCS value (only if using timsTOF)fragment_type: Fragment ion type (b,y, etc.)fragment_series_number: Fragment positionproduct_charge: Fragment chargeintensity: Normalized fragment intensity
Example:
"fine_tune_config": {
"fine_tune": true,
"train_data_path": "experimental_data.tsv",
"batch_size": 256,
"epochs": 5,
"learning_rate": 0.0001,
"save_model": true
}Output file format, reporting, and memory management
| Parameter | Type | Default | Description |
|---|---|---|---|
output_file |
string | "insilico_library.tsv" |
Path for output library file |
write_report |
boolean | true |
Generate HTML quality control report |
parquet_output |
boolean | false |
Output in Parquet format instead of TSV |
peptide_chunking |
integer | 0 |
Peptides per chunk (0 = auto-calculate based on memory) |
Peptide Chunking:
0(default): Automatically calculate chunk size based on available memory (recommended)> 0: Manual chunk size for processing large FASTA files with limited RAM- Larger chunks = faster processing but more memory usage
The minimum required configuration only needs a FASTA file:
{
"database": {
"fasta": "proteins.fasta"
}
}All other parameters will use sensible defaults and pretrained models will be auto-downloaded.
You can override JSON configuration values via command-line arguments:
easypqp-insilico config.json \
--fasta my_proteins.fasta \
--output_file my_library.tsv \
--no-write-report \
--parquetYou can also run without a JSON config file by providing only --fasta:
easypqp-insilico --fasta my_proteins.fastaAll other parameters will use sensible defaults.
Available flags:
--fasta <PATH>: Override database FASTA file--output_file <PATH>: Override output file path--no-write-report: Disable HTML report generation--parquet: Output in Parquet format instead of TSV
When generate_decoys is enabled, reversed decoy peptides are generated automatically.
The decoy_tag (default "DECOY_") is prefixed to each ProteinId, UniprotId, and
GeneName for decoy entries, making them easy to distinguish during downstream analysis.