gpredomics is a Rust rewrite of Predomics for learning interpretable BTR‑style models with discrete coefficients across binary, ternary, ratio, and pow2 languages on raw, prevalence, and log‑transformed data, with optional GPU acceleration via wgpu backends.
- Interpretable languages: binary (subset sum), ternary (−1/0/1 algebraic sum), ratio (sum positive over sum negative), pow2 (ternary with powers of two coefficients).
- Data encodings: raw values, prevalence via epsilon thresholding, and log transforms with epsilon flooring for numerical stability.
- Optimizers: Genetic Algorithm (ga2 Predomics style), Beam search (LimitedExhaustive and ParallelForward), and MCMC with Sequential Backward Selection (beta).
- Fitness targets: AUC, specificity, sensitivity, MCC, F1-score and G means with optional penalties on model size (k_penalty) and false‑rates (fr_penalty).
- Confidence interval for classification threshold, allowing to discover divisive models and to avoid uncertain classifications.
- Cross‑validation: stratified folds, Family of Best Models extraction, and OOB permutation importance aggregation across folds.
- GPU acceleration: wgpu‑based scoring with configurable memory policy and safe CPU fallback when device limits are reached.
Install a recent Rust toolchain and build in release mode for performance on CPU and GPU.
curl –proto '=https' –tlsv1.2 -sSf https://sh.rustup.rs | sh
At the root of this repository, compile gpredomics:
cargo build --release
The executable loads param.yaml from the current working directory on startup. This configuration file contains information about the inputs and experiments to be launched.
To launch gpredomics, simply type:
cargo run --release
Below are minimal TSV schemas that match the loader’s expectations.
X.tsv : features by rows and samples by columns; first column contains feature names, subsequent columns contain numeric values per sample.
| feature | sample_a | sample_b | sample_c |
|---|---|---|---|
| feature_1 | 0.10 | 0.20 | 0.30 |
| feature_2 | 0.00 | 0.05 | 0.10 |
| feature_3 | 0.90 | 0.80 | 0.70 |
y.tsv: two‑column TSV mapping sample to class; header line is ignored; classes: 0 (negative), 1 (positive), 2 (unknown, ignored in metrics).
| sample | class |
|---|---|
| sample_a | 0 |
| sample_b | 1 |
| sample_c | 1 |
A new parameter now allows you to accept a transposed X format, which is standard in ML. To do this, set features_in_rows to false in param.yaml.
CLI commands can be specified to reload a saved experiment or evaluate a new dataset using the models selected during the experiment:
- Default run: execute the binary in a directory that contains
param.yaml; the program initializes logging and dispatches GA/Beam/MCMC according to general.algo. - Specific configuration: execute the binary using another configuration file using --config config.yaml.
- Reload and display: use --load <experiment.(json|mp|bin)> to deserialize a saved Experiment; the format is auto‑detected at load time.
- Evaluate on external data: combine --load with --evaluate and provide --x-test and --y-test to score the saved run on a new dataset.
Flags are defined with clap:
# default execution (param.yaml in CWD)
gpredomics
# specific config
gpredomics --config ./path/config.yaml
# reload a saved experiment and print results
gpredomics --load 2025-01-01_12-00-00_run.msgpack
# evaluate on an external test set
gpredomics --load 2025-01-01_12-00-00_run.msgpack \
--evaluate --x-test /path/X_test.tsv --y-test /path/y_test.tsvNote that --evaluate requires --load and needs --x-test and --y-test.
Termination signals are handled for clean shutdown.
Please note that there is a R package, GpredomicsR, which provides an R interface around the gpredomics engine for ecosystem integration and workflows in R.
- Genetic Algorithm: evaluates and evolves a population with selection, crossover, mutation, and GPU‑accelerated scoring when general.gpu is true.
- Beam search (beta): supports LimitedExhaustive and ParallelForward modes.
- MCMC (beta): explores models probabilistically, enforcing single data_type per run, and retains the best log‑likelihood SBS trajectory.
- CV creates stratified folds, runs the chosen algorithm per training split, and reports training/validation metrics, collecting per‑fold populations.
- OOB permutation importance is computed on Family of Best Models and aggregated across folds by mean or median.
- Runs are controlled by general.seed for reproducibility and use rayon threading and optional GPU for performance with deterministic intent where applicable.
The supported GPUs are:
- Apple Silicons (Metal),
- All GPU supported by Vulkan.
For Apple, Metal is supported out of the box, however you need a recent version of LLVM, more recent at least than the default one. Here is the procedure:
You're supposed to already have the developpers tools (installed with xcode-select --install, which you need anyway for Rust).
The recommanded procedure is to use Homebrew (at least for LLVM, and probably for rustup).
brew install rustup llvm
# the following line is only required if you had already installed Rust with the https://rustup.sh site
mv ~/.cargo ~/.cargo.backup # optionnally remove the line in the .zshrc that load the .cargo/env environment file
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
echo << EOF >> ~/.zshrc
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
EOF
rustup default nightly
rustup updateThen build normally with cargo build --release
For Linux, you must install Vulkan:
sudo apt install vulkan-tools libvulkan1 For Nvidia cards, you will need also a driver, so for instance:
sudo apt install libnvidia-gl-550-server nvidia-driver-550 nvidia-utils-550Check with vulkaninfo that your card is correctly detected.
NB under Linux, I always do a fully optimized build, but that is not mandatory, a simple cargo build --release is enough:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" cargo build --releaseLast updated: v0.7.3