-
Notifications
You must be signed in to change notification settings - Fork 0
05 Supporting Modules
The auto3d function generates optimized 3D conformers from SMILES strings using the AIMNET engine. It's based on https://github.com/isayevlab/Auto3D_pkg.
from rush_py2.auto3d import auto3d, Auto3DOptions
smiles = ["CCO", "c1ccccc1C(=O)O"]
# Configure options
options = Auto3DOptions(k=3, enumerate_tautomer=True)
# Run and collect results
results = auto3d(smiles, opts=options, collect=True)| Parameter | Default | Description |
|---|---|---|
| k | 1 | Number of lowest energy conformers to keep per molecule. |
| enumerate_isomer | True | Whether to enumerate stereoisomers. |
| enumerate_tautomer | False | Whether to enumerate tautomers. |
| opt_steps | 5000 | Maximum optimization steps for the AIMNET engine. |
| threshold | 0.3 | RMSD threshold (Å) for pruning similar conformers. |
| convergence_threshold | 0.003 | Convergence threshold for geometry optimization. |
The boltz module provides an interface for biomolecular structure prediction, supporting proteins, ligands, and complex assemblies. It uses Boltz2: https://github.com/jwohlwend/boltz.
To run a prediction, define your protein sequences (with MSAs) and ligands, then pass them to the boltz function.
from pathlib import Path
from rush_py2.boltz import boltz, ProteinSequence, LigandSequence
# 1. Define inputs
protein = ProteinSequence(
id=["A"],
sequence="MAAHK...",
msa=Path("msa.a3m"), # Path to MSA file or VirtualObject dict
)
ligand = LigandSequence(
id=["B"],
smiles="CC(=O)OC1=CC=CC=C1C(=O)O"
)
# 2. Run prediction
# Returns run_id (async) or full results (sync)
results = boltz([protein, ligand], collect=True)| Class | Required Fields | Optional Fields |
|---|---|---|
| ProteinSequence | id, sequence, msa | modifications, cyclic |
| LigandSequence | id, smiles | — |
| Parameter | Type | Default | Description | Usage Level |
|---|---|---|---|---|
recycling_steps |
int |
None |
Number of recycling iterations. | Core |
sampling_steps |
int |
None |
Number of diffusion sampling steps. | Core |
diffusion_samples |
int |
None |
Total number of diffusion samples to generate. | Core |
template_path |
Path | str |
None |
Path to a PDB or JSON template for structure guidance. | Core |
seed |
int |
None |
Random seed for reproducibility. | Core |
use_potentials |
bool |
None |
Whether to use physical potentials during sampling. | Core |
step_scale |
float |
None |
Scaling factor for diffusion step size. | Advanced |
template_threshold_angstroms |
float |
None |
Maximum allowed deviation (Å) for template guidance. | Advanced |
template_chain_mapping |
dict[str, str] |
None |
Maps target chain IDs to template chain IDs for multi-chain templating. | Advanced |
max_msa_seqs |
int |
None |
Maximum number of MSA sequences to use. | Advanced |
subsample_msa |
bool |
None |
Whether to subsample the MSA for efficiency. | Advanced |
num_subsampled_msa |
int |
None |
Number of MSA sequences to retain when subsampling. | Advanced |
affinity_binder_chain_id |
str |
None |
Chain ID treated as the affinity binder. | Advanced |
affinity_mw_correction |
bool |
None |
Apply molecular-weight correction during affinity estimation. | Advanced |
sampling_steps_affinity |
int |
None |
Sampling steps used during affinity estimation. | Advanced |
diffusion_samples_affinity |
bool |
None |
Enable diffusion sampling for affinity estimation. | Advanced |
run_spec |
RunSpec |
RunSpec(gpus=1) |
Hardware specifications for execution. | Execution |
run_opts |
RunOpts |
RunOpts() |
Execution and scheduling options. | Execution |
collect |
bool |
False |
If True, waits for completion and returns results. |
Execution |
Note:
Core parameters are sufficient for most structure prediction workflows.
Advanced parameters enable fine-grained control and multi-chain or performance-sensitive use cases and should be used with care.
Execution parameters control hardware allocation and job execution behavior.
The mmseqs2 function provides a Python interface for searching and clustering protein sequences using https://github.com/soedinglab/MMseqs2.
from rush_py2.mmseqs import mmseqs2
from rush_py2.client import RunSpec
sequences = [
"MPRLLMRLLLLLLLL",
"MKTIIALSYIFCLVFA"
]
# Run the search and collect results immediately
results = mmseqs2(
sequences,
sensitivity=7.5,
collect=True,
run_spec=RunSpec(gpus=1)
) | Parameter | Default | Description |
|---|---|---|
sequences |
Required | A list of protein sequences (strings) to process. |
prefilter_mode |
None | The prefilter algorithm to use: "KMer", "Ungapped", or "Exhaustive". |
sensitivity |
None | Sensitivity parameter (higher is more sensitive but slower). |
expand_eval |
None | E-value threshold for expanding the search results. |
align_eval |
None | E-value threshold for the alignment stage. |
diff |
None | Minimum score difference for sequence clustering/filtering. |
qsc |
None | Minimum query sequence coverage. |
max_accept |
None | Maximum number of accepted alignments per query. |
run_spec |
RunSpec(gpus=1) |
Hardware specifications for the remote execution (e.g., GPU count). |
collect |
False |
If True, waits for the job to finish and returns the results. If False, returns the run_id. |
The pbsa function performs Poisson-Boltzmann Surface Area (PBSA) calculations on a molecular system. It computes the total, polar, and nonpolar solvation energies based on a topology file.
from pathlib import Path
from rush_py2.pbsa import pbsa
# Path to a topology JSON file (or TRC topology component)
topology = "system_topology.json"
# Run calculation and collect results
results = pbsa(
topology_path=topology,
solute_dielectric=1.0,
solvent_dielectric=80.0,
solvent_radius=1.4,
ion_concentration=0.15,
temperature=298.15,
spacing=0.5,
sasa_gamma=0.0054,
sasa_beta=0.92,
sasa_n_samples=100,
convergence=1e-6,
box_size_factor=1.2,
collect=True
)
print(f"Total Solvation Energy: {results.solvation_energy} Hartrees")| Parameter | Default | Description |
|---|---|---|
topology_path |
Required | Path to the topology file. |
solute_dielectric |
Required | Dielectric constant of the solute (e.g., 1.0–4.0). |
solvent_dielectric |
Required | Dielectric constant of the solvent (e.g., 80.0 for water). |
solvent_radius |
Required | Radius of the solvent probe in Å (e.g., 1.4 for water). |
ion_concentration |
Required | Molar concentration of mobile ions in the solvent. |
temperature |
Required | System temperature in Kelvin. |
spacing |
Required | Grid spacing for the PB solver in Å. |
sasa_gamma |
Required | Surface tension coefficient for nonpolar energy. |
sasa_beta |
Required | Regression offset for nonpolar energy. |
sasa_n_samples |
Required | Number of samples for SASA calculation. |
convergence |
Required | Convergence threshold for the solver. |
box_size_factor |
Required | Padding factor for the calculation box. |
run_spec |
RunSpec(gpus=1) |
Hardware specifications for remote execution. |
collect |
False |
If True, returns PBSAResults; if False, returns run_id. |
This module provides utilities for extracting ligands from PDB files and preparing protein-ligand complexes for simulation. The prepare_complex function is the primary utility for generating a unified protein-ligand system. It automates the process of splitting a structure, preparing the receptor and ligand, and merging them back into a single, simulation-ready TRC (Topology-Reference-Coordinates) object.
from rush_py2.complex_prep import prepare_complex
# Prepare a complex from a PDB file
# This extracts 'LIG', adds hydrogens, titrates the protein at pH 7,
# and merges them back together.
complex_trc = prepare_complex(
input_path="receptor_ligand.pdb",
ligand_names=["LIG"],
ph=7.0,
naming_scheme="AMBER",
collect=True
)| Parameter | Default | Description |
|---|---|---|
input_path |
Required | Path to input .pdb or .json (TRC) file containing the complex. |
ligand_names |
Required | A list of residue names to be treated as ligands (e.g., ["LIG"]). |
ph |
None | The pH used to determine protein protonation states. |
naming_scheme |
None | Atom naming convention for the protein: "AMBER" or "CHARMM". |
capping_style |
None | How to handle protein termini: "never", "truncated", or "always". |
run_spec |
RunSpec() |
Hardware allocation for the remote protein preparation job. |
collect |
False |
If True, returns the merged TRC object. If False, returns the run ID. |
When prepare_complex is called, it executes a two-pronged preparation pipeline:
- Identifies atoms matching
ligand_names. - Uses RDKit to add hydrogens and generate local coordinates.
- Preserves original PDB metadata (chain IDs, residue numbers) for the new atoms.
- Submits the remaining structure to the
prepare_proteinpipeline. - Handles heavy-duty tasks such as:
- Terminal capping
- Protonation/titration
- Structural optimization
- Waits for protein preparation to complete.
- Merges the prepared ligand and protein into a final consolidated TRC structure.
Note
This function is format-agnostic. If a.json(TRC) file is provided as input, it is temporarily converted to PDB format to ensure accurate ligand extraction and hydrogen addition.
The prepare_protein function processes protein structures to ensure they are simulation-ready. It handles the conversion of PDB files into the TRC (Topology-Reference-Coordinates) format while performing essential structural cleanup, including pH-aware protonation (titration), residue capping, and atom naming standardizations.
from rush_py2.prepare_protein import prepare_protein, save_outputs
# Submit a protein for preparation at a specific pH
# This returns a run_id by default
run_id = prepare_protein(
input_path="my_protein.pdb",
ph=7.4,
naming_scheme="AMBER",
capping_style="truncated"
)
# Alternatively, run and wait for the results
results = prepare_protein("my_protein.pdb", ph=7.4, collect=True)
t_path, r_path, c_path = save_outputs(results)| Parameter | Default | Description |
|---|---|---|
input_path |
Required | Path to the input .pdb or .json (TRC) file. |
ph |
None | The pH used to calculate protonation states for ionizable residues. |
naming_scheme |
None | Standardizes atom names: "AMBER" or "CHARMM". |
capping_style |
None | How to treat chain termini: "never", "truncated", or "always". |
truncation_threshold |
None | Max number of residues allowed before applying truncation logic. |
run_spec |
RunSpec() |
Hardware specifications for the remote engine. |
collect |
False |
If True, waits for completion and returns the result object. |
-
Component-Based Output
Unlike a standard PDB file, the output is returned as three distinct paths corresponding to the Topology (T), Residues (R), and Chains (C) components of the system. -
Titration
When aphvalue is provided, the engine automatically adjusts the protonation states of ionizable residues such as Histidine, Aspartate, and Glutamate.