Skip to content

05 Supporting Modules

kayleigh222 edited this page Jan 1, 2026 · 1 revision

Auto3D

The auto3d function generates optimized 3D conformers from SMILES strings using the AIMNET engine. It's based on https://github.com/isayevlab/Auto3D_pkg.

Quick Start

from rush_py2.auto3d import auto3d, Auto3DOptions

smiles = ["CCO", "c1ccccc1C(=O)O"]

# Configure options
options = Auto3DOptions(k=3, enumerate_tautomer=True)

# Run and collect results
results = auto3d(smiles, opts=options, collect=True)

Configuration Options

Parameter Default Description
k 1 Number of lowest energy conformers to keep per molecule.
enumerate_isomer True Whether to enumerate stereoisomers.
enumerate_tautomer False Whether to enumerate tautomers.
opt_steps 5000 Maximum optimization steps for the AIMNET engine.
threshold 0.3 RMSD threshold (Å) for pruning similar conformers.
convergence_threshold 0.003 Convergence threshold for geometry optimization.

Boltz2

The boltz module provides an interface for biomolecular structure prediction, supporting proteins, ligands, and complex assemblies. It uses Boltz2: https://github.com/jwohlwend/boltz.

Quick Start

To run a prediction, define your protein sequences (with MSAs) and ligands, then pass them to the boltz function.

from pathlib import Path
from rush_py2.boltz import boltz, ProteinSequence, LigandSequence

# 1. Define inputs
protein = ProteinSequence(
    id=["A"],
    sequence="MAAHK...",
    msa=Path("msa.a3m"), # Path to MSA file or VirtualObject dict
)

ligand = LigandSequence(
    id=["B"],
    smiles="CC(=O)OC1=CC=CC=C1C(=O)O"
)

# 2. Run prediction
# Returns run_id (async) or full results (sync)
results = boltz([protein, ligand], collect=True)

Input Data Classes

Class Required Fields Optional Fields
ProteinSequence id, sequence, msa modifications, cyclic
LigandSequence id, smiles

Configuration Parameters

Parameter Type Default Description Usage Level
recycling_steps int None Number of recycling iterations. Core
sampling_steps int None Number of diffusion sampling steps. Core
diffusion_samples int None Total number of diffusion samples to generate. Core
template_path Path | str None Path to a PDB or JSON template for structure guidance. Core
seed int None Random seed for reproducibility. Core
use_potentials bool None Whether to use physical potentials during sampling. Core
step_scale float None Scaling factor for diffusion step size. Advanced
template_threshold_angstroms float None Maximum allowed deviation (Å) for template guidance. Advanced
template_chain_mapping dict[str, str] None Maps target chain IDs to template chain IDs for multi-chain templating. Advanced
max_msa_seqs int None Maximum number of MSA sequences to use. Advanced
subsample_msa bool None Whether to subsample the MSA for efficiency. Advanced
num_subsampled_msa int None Number of MSA sequences to retain when subsampling. Advanced
affinity_binder_chain_id str None Chain ID treated as the affinity binder. Advanced
affinity_mw_correction bool None Apply molecular-weight correction during affinity estimation. Advanced
sampling_steps_affinity int None Sampling steps used during affinity estimation. Advanced
diffusion_samples_affinity bool None Enable diffusion sampling for affinity estimation. Advanced
run_spec RunSpec RunSpec(gpus=1) Hardware specifications for execution. Execution
run_opts RunOpts RunOpts() Execution and scheduling options. Execution
collect bool False If True, waits for completion and returns results. Execution

Note:
Core parameters are sufficient for most structure prediction workflows.
Advanced parameters enable fine-grained control and multi-chain or performance-sensitive use cases and should be used with care.
Execution parameters control hardware allocation and job execution behavior.

MMseqs2

The mmseqs2 function provides a Python interface for searching and clustering protein sequences using https://github.com/soedinglab/MMseqs2.

Quick Start

from rush_py2.mmseqs import mmseqs2
from rush_py2.client import RunSpec

sequences = [
    "MPRLLMRLLLLLLLL",
    "MKTIIALSYIFCLVFA"
]

# Run the search and collect results immediately
results = mmseqs2(
    sequences, 
    sensitivity=7.5, 
    collect=True,
    run_spec=RunSpec(gpus=1)
) 

Configuration Options

Parameter Default Description
sequences Required A list of protein sequences (strings) to process.
prefilter_mode None The prefilter algorithm to use: "KMer", "Ungapped", or "Exhaustive".
sensitivity None Sensitivity parameter (higher is more sensitive but slower).
expand_eval None E-value threshold for expanding the search results.
align_eval None E-value threshold for the alignment stage.
diff None Minimum score difference for sequence clustering/filtering.
qsc None Minimum query sequence coverage.
max_accept None Maximum number of accepted alignments per query.
run_spec RunSpec(gpus=1) Hardware specifications for the remote execution (e.g., GPU count).
collect False If True, waits for the job to finish and returns the results. If False, returns the run_id.

PBSA

The pbsa function performs Poisson-Boltzmann Surface Area (PBSA) calculations on a molecular system. It computes the total, polar, and nonpolar solvation energies based on a topology file.

Quick Start

from pathlib import Path
from rush_py2.pbsa import pbsa

# Path to a topology JSON file (or TRC topology component)
topology = "system_topology.json"

# Run calculation and collect results
results = pbsa(
    topology_path=topology,
    solute_dielectric=1.0,
    solvent_dielectric=80.0,
    solvent_radius=1.4,
    ion_concentration=0.15,
    temperature=298.15,
    spacing=0.5,
    sasa_gamma=0.0054,
    sasa_beta=0.92,
    sasa_n_samples=100,
    convergence=1e-6,
    box_size_factor=1.2,
    collect=True
)

print(f"Total Solvation Energy: {results.solvation_energy} Hartrees")

Configuration Options

Parameter Default Description
topology_path Required Path to the topology file.
solute_dielectric Required Dielectric constant of the solute (e.g., 1.0–4.0).
solvent_dielectric Required Dielectric constant of the solvent (e.g., 80.0 for water).
solvent_radius Required Radius of the solvent probe in Å (e.g., 1.4 for water).
ion_concentration Required Molar concentration of mobile ions in the solvent.
temperature Required System temperature in Kelvin.
spacing Required Grid spacing for the PB solver in Å.
sasa_gamma Required Surface tension coefficient for nonpolar energy.
sasa_beta Required Regression offset for nonpolar energy.
sasa_n_samples Required Number of samples for SASA calculation.
convergence Required Convergence threshold for the solver.
box_size_factor Required Padding factor for the calculation box.
run_spec RunSpec(gpus=1) Hardware specifications for remote execution.
collect False If True, returns PBSAResults; if False, returns run_id.

Protein-Ligand Preparation

This module provides utilities for extracting ligands from PDB files and preparing protein-ligand complexes for simulation. The prepare_complex function is the primary utility for generating a unified protein-ligand system. It automates the process of splitting a structure, preparing the receptor and ligand, and merging them back into a single, simulation-ready TRC (Topology-Reference-Coordinates) object.

Quick Start

from rush_py2.complex_prep import prepare_complex

# Prepare a complex from a PDB file
# This extracts 'LIG', adds hydrogens, titrates the protein at pH 7, 
# and merges them back together.
complex_trc = prepare_complex(
    input_path="receptor_ligand.pdb",
    ligand_names=["LIG"],
    ph=7.0,
    naming_scheme="AMBER",
    collect=True
)

Configuration Options

Parameter Default Description
input_path Required Path to input .pdb or .json (TRC) file containing the complex.
ligand_names Required A list of residue names to be treated as ligands (e.g., ["LIG"]).
ph None The pH used to determine protein protonation states.
naming_scheme None Atom naming convention for the protein: "AMBER" or "CHARMM".
capping_style None How to handle protein termini: "never", "truncated", or "always".
run_spec RunSpec() Hardware allocation for the remote protein preparation job.
collect False If True, returns the merged TRC object. If False, returns the run ID.

Internal Workflow

When prepare_complex is called, it executes a two-pronged preparation pipeline:

Ligand Branch

  • Identifies atoms matching ligand_names.
  • Uses RDKit to add hydrogens and generate local coordinates.
  • Preserves original PDB metadata (chain IDs, residue numbers) for the new atoms.

Protein Branch

  • Submits the remaining structure to the prepare_protein pipeline.
  • Handles heavy-duty tasks such as:
    • Terminal capping
    • Protonation/titration
    • Structural optimization

Integration

  • Waits for protein preparation to complete.
  • Merges the prepared ligand and protein into a final consolidated TRC structure.

Note
This function is format-agnostic. If a .json (TRC) file is provided as input, it is temporarily converted to PDB format to ensure accurate ligand extraction and hydrogen addition.

Protein Preparation

The prepare_protein function processes protein structures to ensure they are simulation-ready. It handles the conversion of PDB files into the TRC (Topology-Reference-Coordinates) format while performing essential structural cleanup, including pH-aware protonation (titration), residue capping, and atom naming standardizations.

Quick Start

from rush_py2.prepare_protein import prepare_protein, save_outputs

# Submit a protein for preparation at a specific pH
# This returns a run_id by default
run_id = prepare_protein(
    input_path="my_protein.pdb",
    ph=7.4,
    naming_scheme="AMBER",
    capping_style="truncated"
)

# Alternatively, run and wait for the results
results = prepare_protein("my_protein.pdb", ph=7.4, collect=True)
t_path, r_path, c_path = save_outputs(results)
Parameter Default Description
input_path Required Path to the input .pdb or .json (TRC) file.
ph None The pH used to calculate protonation states for ionizable residues.
naming_scheme None Standardizes atom names: "AMBER" or "CHARMM".
capping_style None How to treat chain termini: "never", "truncated", or "always".
truncation_threshold None Max number of residues allowed before applying truncation logic.
run_spec RunSpec() Hardware specifications for the remote engine.
collect False If True, waits for completion and returns the result object.

Technical Details

  • Component-Based Output
    Unlike a standard PDB file, the output is returned as three distinct paths corresponding to the Topology (T), Residues (R), and Chains (C) components of the system.

  • Titration
    When a ph value is provided, the engine automatically adjusts the protonation states of ionizable residues such as Histidine, Aspartate, and Glutamate.

Clone this wiki locally