Predict gel outcomes before running the experiment.
PAGE-DT is a research-grade Digital Twin of SDS-PAGE electrophoresis. It combines:
- Physics engine — Ferguson equation, diffusion, smearing physics
- ML predictor — Random Forest + XGBoost + Neural Network ensemble
- Synthetic gel image generator — OpenCV-based realistic gel rendering
- Failure diagnosis — Rule-based expert system with confidence scoring
- Parameter optimizer — Grid-search + hill-climbing for target separation
- AI Assistant — Claude-powered expert chatbot
pip install -r requirements.txtpython train_all.pypython src/dashboard/app.pyOpen http://localhost:7860 in your browser.
PAGE-DT/
├── src/
│ ├── simulator/
│ │ └── physics_engine.py # Phase 1: Ferguson equation simulator
│ ├── data/
│ │ └── dataset_generator.py # Phase 2: 12,000 virtual experiments
│ ├── models/
│ │ └── ml_predictor.py # Phase 3: RF + XGBoost + MLP
│ ├── imaging/
│ │ └── gel_generator.py # Phase 4: OpenCV gel image generator
│ ├── diagnosis/
│ │ └── failure_engine.py # Phase 5: Failure diagnosis engine
│ ├── optimizer/
│ │ └── param_optimizer.py # Phase 6: Parameter optimizer
│ └── dashboard/
│ └── app.py # Phase 7: Gradio dashboard
├── data/
│ ├── raw/ # Generated CSV dataset
│ ├── processed/
│ └── models/ # Saved ML models (.pkl)
├── outputs/
│ └── images/ # Generated gel images
├── train_all.py # Master training script
├── requirements.txt
└── README.md
Implements the Ferguson equation: log(Rf) = log(Y0) - KR × T
Predicts:
- Migration distance (cm)
- Relative mobility (Rf)
- Band width (mm) — from diffusion + overloading
- Smearing score (0–1) — voltage, concentration, time effects
- Band intensity (0–1) — Beer-Lambert approximation
- Separation quality (0–1) — composite score
Generates 12,000 synthetic experiments via:
- 70% uniform random sampling
- 20% Latin Hypercube stratified sampling
- 10% edge/stress cases
Each row: 5 inputs + 6 physics outputs + 4 derived ML features = 15 columns
Trains 3 models per target (band_position, band_intensity, smearing_score):
| Model | Architecture |
|---|---|
| Random Forest | 200 trees, max_depth=12 |
| XGBoost | 300 estimators, lr=0.05, subsample=0.8 |
| Neural Network | MLP [128→64→32], ReLU, early stopping |
Saves best model per target. Reports MAE, RMSE, R².
OpenCV-based rendering with:
- Gaussian band profiles (vertical + horizontal)
- Smearing trails (exponential decay)
- Shot noise + background gradient
- MW ladder with auto-computed positions
- Lane labels and MW axis
10 failure modes with confidence scoring:
- OVERLOADED_SAMPLE, VOLTAGE_TOO_HIGH/LOW
- POOR_GEL_CONCENTRATION, SMEARING_OVERCONCENTRATION
- INCOMPLETE_SEPARATION, RUN_TOO_SHORT/LONG
- BAND_NEAR_STACKING, OPTIMAL
Coarse grid (N³) + fine local search around best point.
Objective function:
- 50% minimum band gap
- 20% mean separation quality
- −15% smearing penalty
- −15% runoff penalty
6 Gradio tabs:
- Predict Outcome — single protein prediction
- Virtual Gel — multi-lane gel generation
- What-If Simulator — parameter sweep charts
- Failure Diagnosis — health gauge + confidence bars
- Parameter Optimizer — sensitivity analysis + optimal gel preview
- AI Assistant — Claude-powered expert chatbot
from src.simulator.physics_engine import SDSPAGEPhysicsEngine
engine = SDSPAGEPhysicsEngine()
result = engine.simulate(
protein_mw=50.0, # kDa
gel_percentage=10.0, # %
voltage=100.0, # V
run_time=60.0, # min
concentration=1.0 # µg/µL
)
print(f"Rf = {result.relative_mobility:.4f}")
print(f"Position = {result.migration_distance:.3f} cm")
print(f"Smearing = {result.smearing_score:.4f}")from src.optimizer.param_optimizer import ParameterOptimizer
opt = ParameterOptimizer()
result = opt.optimize(target_proteins=[50.0, 60.0])
print(f"Use {result.optimal_gel_pct}% gel at {result.optimal_voltage}V for {result.optimal_runtime} min")| Effect | Model |
|---|---|
| Mobility vs MW | Ferguson equation + Stokes radius |
| Band position | Rf × gel length × run completeness |
| Band broadening | Einstein-Smoluchowski diffusion |
| Smearing | Concentration overload + voltage + time |
| Intensity | Beer-Lambert approximation |
- Python 3.9+
- 4 GB RAM (8 GB recommended for training)
- No GPU required — runs on laptop CPU
MIT License — research use encouraged.