05 Supporting Modules

Auto3D

The auto3d function generates optimized 3D conformers from SMILES strings using the AIMNET engine. It's based on https://github.com/isayevlab/Auto3D_pkg.

Quick Start

from rush_py2.auto3d import auto3d, Auto3DOptions

smiles = ["CCO", "c1ccccc1C(=O)O"]

# Configure options
options = Auto3DOptions(k=3, enumerate_tautomer=True)

# Run and collect results
results = auto3d(smiles, opts=options, collect=True)

Configuration Options

Parameter	Default	Description
k	1	Number of lowest energy conformers to keep per molecule.
enumerate_isomer	True	Whether to enumerate stereoisomers.
enumerate_tautomer	False	Whether to enumerate tautomers.
opt_steps	5000	Maximum optimization steps for the AIMNET engine.
threshold	0.3	RMSD threshold (Å) for pruning similar conformers.
convergence_threshold	0.003	Convergence threshold for geometry optimization.

Boltz2

The boltz module provides an interface for biomolecular structure prediction, supporting proteins, ligands, and complex assemblies. It uses Boltz2: https://github.com/jwohlwend/boltz.

Quick Start

To run a prediction, define your protein sequences (with MSAs) and ligands, then pass them to the boltz function.

from pathlib import Path
from rush_py2.boltz import boltz, ProteinSequence, LigandSequence

# 1. Define inputs
protein = ProteinSequence(
    id=["A"],
    sequence="MAAHK...",
    msa=Path("msa.a3m"), # Path to MSA file or VirtualObject dict
)

ligand = LigandSequence(
    id=["B"],
    smiles="CC(=O)OC1=CC=CC=C1C(=O)O"
)

# 2. Run prediction
# Returns run_id (async) or full results (sync)
results = boltz([protein, ligand], collect=True)

Input Data Classes

Class	Required Fields	Optional Fields
ProteinSequence	id, sequence, msa	modifications, cyclic
LigandSequence	id, smiles	—

Configuration Parameters

Parameter	Type	Default	Description	Usage Level
`recycling_steps`	`int`	`None`	Number of recycling iterations.	Core
`sampling_steps`	`int`	`None`	Number of diffusion sampling steps.	Core
`diffusion_samples`	`int`	`None`	Total number of diffusion samples to generate.	Core
`template_path`	`Path \| str`	`None`	Path to a PDB or JSON template for structure guidance.	Core
`seed`	`int`	`None`	Random seed for reproducibility.	Core
`use_potentials`	`bool`	`None`	Whether to use physical potentials during sampling.	Core
`step_scale`	`float`	`None`	Scaling factor for diffusion step size.	Advanced
`template_threshold_angstroms`	`float`	`None`	Maximum allowed deviation (Å) for template guidance.	Advanced
`template_chain_mapping`	`dict[str, str]`	`None`	Maps target chain IDs to template chain IDs for multi-chain templating.	Advanced
`max_msa_seqs`	`int`	`None`	Maximum number of MSA sequences to use.	Advanced
`subsample_msa`	`bool`	`None`	Whether to subsample the MSA for efficiency.	Advanced
`num_subsampled_msa`	`int`	`None`	Number of MSA sequences to retain when subsampling.	Advanced
`affinity_binder_chain_id`	`str`	`None`	Chain ID treated as the affinity binder.	Advanced
`affinity_mw_correction`	`bool`	`None`	Apply molecular-weight correction during affinity estimation.	Advanced
`sampling_steps_affinity`	`int`	`None`	Sampling steps used during affinity estimation.	Advanced
`diffusion_samples_affinity`	`bool`	`None`	Enable diffusion sampling for affinity estimation.	Advanced
`run_spec`	`RunSpec`	`RunSpec(gpus=1)`	Hardware specifications for execution.	Execution
`run_opts`	`RunOpts`	`RunOpts()`	Execution and scheduling options.	Execution
`collect`	`bool`	`False`	If `True`, waits for completion and returns results.	Execution

Note:
Core parameters are sufficient for most structure prediction workflows.
Advanced parameters enable fine-grained control and multi-chain or performance-sensitive use cases and should be used with care.
Execution parameters control hardware allocation and job execution behavior.

MMseqs2

The mmseqs2 function provides a Python interface for searching and clustering protein sequences using https://github.com/soedinglab/MMseqs2.

Quick Start

from rush_py2.mmseqs import mmseqs2
from rush_py2.client import RunSpec

sequences = [
    "MPRLLMRLLLLLLLL",
    "MKTIIALSYIFCLVFA"
]

# Run the search and collect results immediately
results = mmseqs2(
    sequences, 
    sensitivity=7.5, 
    collect=True,
    run_spec=RunSpec(gpus=1)
)

Configuration Options

Parameter	Default	Description
`sequences`	Required	A list of protein sequences (strings) to process.
`prefilter_mode`	None	The prefilter algorithm to use: `"KMer"`, `"Ungapped"`, or `"Exhaustive"`.
`sensitivity`	None	Sensitivity parameter (higher is more sensitive but slower).
`expand_eval`	None	E-value threshold for expanding the search results.
`align_eval`	None	E-value threshold for the alignment stage.
`diff`	None	Minimum score difference for sequence clustering/filtering.
`qsc`	None	Minimum query sequence coverage.
`max_accept`	None	Maximum number of accepted alignments per query.
`run_spec`	`RunSpec(gpus=1)`	Hardware specifications for the remote execution (e.g., GPU count).
`collect`	`False`	If `True`, waits for the job to finish and returns the results. If `False`, returns the `run_id`.

PBSA

The pbsa function performs Poisson-Boltzmann Surface Area (PBSA) calculations on a molecular system. It computes the total, polar, and nonpolar solvation energies based on a topology file.

Quick Start

from pathlib import Path
from rush_py2.pbsa import pbsa

# Path to a topology JSON file (or TRC topology component)
topology = "system_topology.json"

# Run calculation and collect results
results = pbsa(
    topology_path=topology,
    solute_dielectric=1.0,
    solvent_dielectric=80.0,
    solvent_radius=1.4,
    ion_concentration=0.15,
    temperature=298.15,
    spacing=0.5,
    sasa_gamma=0.0054,
    sasa_beta=0.92,
    sasa_n_samples=100,
    convergence=1e-6,
    box_size_factor=1.2,
    collect=True
)

print(f"Total Solvation Energy: {results.solvation_energy} Hartrees")

Configuration Options

Parameter	Default	Description
`topology_path`	Required	Path to the topology file.
`solute_dielectric`	Required	Dielectric constant of the solute (e.g., 1.0–4.0).
`solvent_dielectric`	Required	Dielectric constant of the solvent (e.g., 80.0 for water).
`solvent_radius`	Required	Radius of the solvent probe in Å (e.g., 1.4 for water).
`ion_concentration`	Required	Molar concentration of mobile ions in the solvent.
`temperature`	Required	System temperature in Kelvin.
`spacing`	Required	Grid spacing for the PB solver in Å.
`sasa_gamma`	Required	Surface tension coefficient for nonpolar energy.
`sasa_beta`	Required	Regression offset for nonpolar energy.
`sasa_n_samples`	Required	Number of samples for SASA calculation.
`convergence`	Required	Convergence threshold for the solver.
`box_size_factor`	Required	Padding factor for the calculation box.
`run_spec`	`RunSpec(gpus=1)`	Hardware specifications for remote execution.
`collect`	`False`	If `True`, returns `PBSAResults`; if `False`, returns `run_id`.

Protein-Ligand Preparation

This module provides utilities for extracting ligands from PDB files and preparing protein-ligand complexes for simulation. The prepare_complex function is the primary utility for generating a unified protein-ligand system. It automates the process of splitting a structure, preparing the receptor and ligand, and merging them back into a single, simulation-ready TRC (Topology-Reference-Coordinates) object.

Quick Start

from rush_py2.complex_prep import prepare_complex

# Prepare a complex from a PDB file
# This extracts 'LIG', adds hydrogens, titrates the protein at pH 7, 
# and merges them back together.
complex_trc = prepare_complex(
    input_path="receptor_ligand.pdb",
    ligand_names=["LIG"],
    ph=7.0,
    naming_scheme="AMBER",
    collect=True
)

Configuration Options

Parameter	Default	Description
`input_path`	Required	Path to input `.pdb` or `.json` (TRC) file containing the complex.
`ligand_names`	Required	A list of residue names to be treated as ligands (e.g., `["LIG"]`).
`ph`	None	The pH used to determine protein protonation states.
`naming_scheme`	None	Atom naming convention for the protein: `"AMBER"` or `"CHARMM"`.
`capping_style`	None	How to handle protein termini: `"never"`, `"truncated"`, or `"always"`.
`run_spec`	`RunSpec()`	Hardware allocation for the remote protein preparation job.
`collect`	`False`	If `True`, returns the merged TRC object. If `False`, returns the run ID.

Internal Workflow

When prepare_complex is called, it executes a two-pronged preparation pipeline:

Ligand Branch

Identifies atoms matching ligand_names.
Uses RDKit to add hydrogens and generate local coordinates.
Preserves original PDB metadata (chain IDs, residue numbers) for the new atoms.

Protein Branch

Submits the remaining structure to the prepare_protein pipeline.
Handles heavy-duty tasks such as:
- Terminal capping
- Protonation/titration
- Structural optimization

Integration

Waits for protein preparation to complete.
Merges the prepared ligand and protein into a final consolidated TRC structure.

Note
This function is format-agnostic. If a .json (TRC) file is provided as input, it is temporarily converted to PDB format to ensure accurate ligand extraction and hydrogen addition.

Protein Preparation

The prepare_protein function processes protein structures to ensure they are simulation-ready. It handles the conversion of PDB files into the TRC (Topology-Reference-Coordinates) format while performing essential structural cleanup, including pH-aware protonation (titration), residue capping, and atom naming standardizations.

Quick Start

from rush_py2.prepare_protein import prepare_protein, save_outputs

# Submit a protein for preparation at a specific pH
# This returns a run_id by default
run_id = prepare_protein(
    input_path="my_protein.pdb",
    ph=7.4,
    naming_scheme="AMBER",
    capping_style="truncated"
)

# Alternatively, run and wait for the results
results = prepare_protein("my_protein.pdb", ph=7.4, collect=True)
t_path, r_path, c_path = save_outputs(results)

Parameter	Default	Description
`input_path`	Required	Path to the input `.pdb` or `.json` (TRC) file.
`ph`	None	The pH used to calculate protonation states for ionizable residues.
`naming_scheme`	None	Standardizes atom names: `"AMBER"` or `"CHARMM"`.
`capping_style`	None	How to treat chain termini: `"never"`, `"truncated"`, or `"always"`.
`truncation_threshold`	None	Max number of residues allowed before applying truncation logic.
`run_spec`	`RunSpec()`	Hardware specifications for the remote engine.
`collect`	`False`	If `True`, waits for completion and returns the result object.

Technical Details

Component-Based Output
Unlike a standard PDB file, the output is returned as three distinct paths corresponding to the Topology (T), Residues (R), and Chains (C) components of the system.
Titration
When a ph value is provided, the engine automatically adjusts the protonation states of ionizable residues such as Histidine, Aspartate, and Glutamate.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05 Supporting Modules

Auto3D

Quick Start

Configuration Options

Boltz2

Quick Start

Input Data Classes

Configuration Parameters

MMseqs2

Quick Start

Configuration Options

PBSA

Quick Start

Configuration Options

Protein-Ligand Preparation

Quick Start

Configuration Options

Internal Workflow

Ligand Branch

Protein Branch

Integration

Protein Preparation

Quick Start

Technical Details

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally