Skip to content

AkhilleshVarathan/page-dt

Repository files navigation

PAGE-DT: Digital Twin Framework for SDS-PAGE Outcome Prediction

Predict gel outcomes before running the experiment.

Python Gradio License


Overview

PAGE-DT is a research-grade Digital Twin of SDS-PAGE electrophoresis. It combines:

  • Physics engine — Ferguson equation, diffusion, smearing physics
  • ML predictor — Random Forest + XGBoost + Neural Network ensemble
  • Synthetic gel image generator — OpenCV-based realistic gel rendering
  • Failure diagnosis — Rule-based expert system with confidence scoring
  • Parameter optimizer — Grid-search + hill-climbing for target separation
  • AI Assistant — Claude-powered expert chatbot

Quick Start

1. Install dependencies

pip install -r requirements.txt

2. Train all models (generates 12k experiments, trains 3 ML models)

python train_all.py

3. Launch dashboard

python src/dashboard/app.py

Open http://localhost:7860 in your browser.


Architecture

PAGE-DT/
├── src/
│   ├── simulator/
│   │   └── physics_engine.py        # Phase 1: Ferguson equation simulator
│   ├── data/
│   │   └── dataset_generator.py     # Phase 2: 12,000 virtual experiments
│   ├── models/
│   │   └── ml_predictor.py          # Phase 3: RF + XGBoost + MLP
│   ├── imaging/
│   │   └── gel_generator.py         # Phase 4: OpenCV gel image generator
│   ├── diagnosis/
│   │   └── failure_engine.py        # Phase 5: Failure diagnosis engine
│   ├── optimizer/
│   │   └── param_optimizer.py       # Phase 6: Parameter optimizer
│   └── dashboard/
│       └── app.py                   # Phase 7: Gradio dashboard
├── data/
│   ├── raw/                         # Generated CSV dataset
│   ├── processed/
│   └── models/                      # Saved ML models (.pkl)
├── outputs/
│   └── images/                      # Generated gel images
├── train_all.py                     # Master training script
├── requirements.txt
└── README.md

Modules

Phase 1: Physics Engine (physics_engine.py)

Implements the Ferguson equation: log(Rf) = log(Y0) - KR × T

Predicts:

  • Migration distance (cm)
  • Relative mobility (Rf)
  • Band width (mm) — from diffusion + overloading
  • Smearing score (0–1) — voltage, concentration, time effects
  • Band intensity (0–1) — Beer-Lambert approximation
  • Separation quality (0–1) — composite score

Phase 2: Dataset Generator (dataset_generator.py)

Generates 12,000 synthetic experiments via:

  • 70% uniform random sampling
  • 20% Latin Hypercube stratified sampling
  • 10% edge/stress cases

Each row: 5 inputs + 6 physics outputs + 4 derived ML features = 15 columns

Phase 3: ML Predictor (ml_predictor.py)

Trains 3 models per target (band_position, band_intensity, smearing_score):

Model Architecture
Random Forest 200 trees, max_depth=12
XGBoost 300 estimators, lr=0.05, subsample=0.8
Neural Network MLP [128→64→32], ReLU, early stopping

Saves best model per target. Reports MAE, RMSE, R².

Phase 4: Gel Image Generator (gel_generator.py)

OpenCV-based rendering with:

  • Gaussian band profiles (vertical + horizontal)
  • Smearing trails (exponential decay)
  • Shot noise + background gradient
  • MW ladder with auto-computed positions
  • Lane labels and MW axis

Phase 5: Failure Diagnosis (failure_engine.py)

10 failure modes with confidence scoring:

  • OVERLOADED_SAMPLE, VOLTAGE_TOO_HIGH/LOW
  • POOR_GEL_CONCENTRATION, SMEARING_OVERCONCENTRATION
  • INCOMPLETE_SEPARATION, RUN_TOO_SHORT/LONG
  • BAND_NEAR_STACKING, OPTIMAL

Phase 6: Parameter Optimizer (param_optimizer.py)

Coarse grid (N³) + fine local search around best point.

Objective function:

  • 50% minimum band gap
  • 20% mean separation quality
  • −15% smearing penalty
  • −15% runoff penalty

Phase 7: Dashboard (app.py)

6 Gradio tabs:

  1. Predict Outcome — single protein prediction
  2. Virtual Gel — multi-lane gel generation
  3. What-If Simulator — parameter sweep charts
  4. Failure Diagnosis — health gauge + confidence bars
  5. Parameter Optimizer — sensitivity analysis + optimal gel preview
  6. AI Assistant — Claude-powered expert chatbot

Example Usage (Python API)

from src.simulator.physics_engine import SDSPAGEPhysicsEngine

engine = SDSPAGEPhysicsEngine()
result = engine.simulate(
    protein_mw=50.0,      # kDa
    gel_percentage=10.0,  # %
    voltage=100.0,        # V
    run_time=60.0,        # min
    concentration=1.0     # µg/µL
)
print(f"Rf = {result.relative_mobility:.4f}")
print(f"Position = {result.migration_distance:.3f} cm")
print(f"Smearing = {result.smearing_score:.4f}")
from src.optimizer.param_optimizer import ParameterOptimizer

opt = ParameterOptimizer()
result = opt.optimize(target_proteins=[50.0, 60.0])
print(f"Use {result.optimal_gel_pct}% gel at {result.optimal_voltage}V for {result.optimal_runtime} min")

Scientific Basis

Effect Model
Mobility vs MW Ferguson equation + Stokes radius
Band position Rf × gel length × run completeness
Band broadening Einstein-Smoluchowski diffusion
Smearing Concentration overload + voltage + time
Intensity Beer-Lambert approximation

Requirements

  • Python 3.9+
  • 4 GB RAM (8 GB recommended for training)
  • No GPU required — runs on laptop CPU

License

MIT License — research use encouraged.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages