Skip to content

s-rani1/pharmacoml

Repository files navigation

pharmacoml

PyPI version Python versions License CI

pharmacoml is a benchmark-backed hybrid AI/ML covariate screening toolkit for population PK/PD, combining explainable ML discovery, penalized confirmation, and SCM-style bridging in an estimation-tool-agnostic Python workflow.

What It Is

pharmacoml helps pharmacometricians use a hybrid AI/ML screening workflow to identify and prioritize likely covariates from subject-level EBEs or individual parameters before formal model confirmation. It is designed to work with outputs from NONMEM, nlmixr2, Monolix, Pumas, or similar mixed-effects workflows.

The current release is evaluated against a fixed public benchmark suite that includes real public PK examples and paper-style benchmark scenarios.

What It Is Not

pharmacoml is not a replacement for final NLME estimation, full model search, or pharmacometric confirmation in the current release. It is a hybrid AI/ML covariate screening and preselection tool designed to reduce search space before SCM, backward elimination, or final model fitting.

Why It Is Different

  • Uses a hybrid AI/ML screening workflow that combines explainable ML discovery, penalized confirmation, and SCM-style bridging instead of relying on a single method.
  • Works with EBEs or individual parameters from any solver, including NONMEM, nlmixr2, Monolix, and Pumas, so screening is not tied to a single estimation engine.
  • Supports many screening backends, including explainable boosting, AALASSO, Stochastic Gates (STG), and an SCM-style bridge, rather than relying on a single screening model.
  • Includes pharmacometric screening features such as shrinkage-aware logic, biology-aware proxy preservation, and optional interaction screening.
  • Ships with a public benchmark suite, pinned baselines, and generated benchmark reports so workflow changes can be evaluated against fixed reference cases.

Installation

From PyPI:

pip install pharmacoml

For development:

git clone https://github.com/s-rani1/pharmacoml.git
cd pharmacoml
pip install -e ".[dev]"

Optional extras:

pip install -e ".[dev,dl,symbolic]"

Quick Start

import pandas as pd
from pharmacoml.covselect import HybridScreener

ebes = pd.read_csv("individual_parameters.csv")
covariates = pd.read_csv("covariates.csv")

report = HybridScreener(
    include_scm=True,
).fit(
    ebes=ebes,
    covariates=covariates,
    parameter_shrinkage={"CL": 0.12, "V": 0.28},
)

report.confirmed_covariates()   # recommended daily-use answer
report.candidate_covariates()   # shortlist to carry forward
report.core_covariates()        # strongest ML-supported signals
report.proxy_groups()           # correlated alternatives
print(report.to_nonmem_candidates())

For reproducible runs, set a fixed random_state. In the current release, the default hybrid workflow is stable across repeated runs when the data, settings, and random seed are unchanged. Experimental or more stochastic paths may vary more across runs and environments.

Example confirmed_covariates() output:

  parameter covariate functional_form confirmation_status
0        CL        WT           power                 scm
1         V        WT           power                 scm

Typical Outputs

  • confirmed_covariates(): compact answer after SCM-style confirmation
  • candidate_covariates(): practical shortlist for downstream PMx confirmation
  • core_covariates(): strongest ML-supported signals
  • proxy_groups(): correlated or overlapping covariate groups
  • interaction_covariates(): screened interactions when enabled
  • to_nonmem_candidates(): export-ready candidate block for downstream workflows

How to Read the Outputs

  • confirmed_covariates(): start here for the most compact daily-use answer. These are the covariates that survive the package's confirmation layer and are the clearest candidates to carry forward.
  • candidate_covariates(): use this as the practical shortlist for formal PMx confirmation. It is intentionally broader than confirmed_covariates() and is often the right input to SCM or backward elimination.
  • core_covariates(): the strongest ML-supported signals before confirmation. This is useful when you want to inspect what the AI/ML layer found most strongly, even if not every signal is retained in the final confirmed set.
  • proxy_groups(): review this whenever correlated covariates are plausible. It shows which variables are acting as correlated alternatives so you can make a pharmacometrically sensible choice downstream.
  • interaction_covariates(): only relevant when interaction screening is enabled. These are pairwise interaction terms that survived the screening workflow.
  • to_nonmem_candidates(): use this when you want a direct candidate block to carry into a downstream modeling workflow.

For most users, the practical reading order is:

  1. confirmed_covariates()
  2. candidate_covariates()
  3. proxy_groups()
  4. to_nonmem_candidates()

For reproducibility, keep random_state fixed when comparing runs or benchmarking workflow changes.

Benchmarks

pharmacoml includes a fixed public benchmark suite for release calibration:

  • pheno (Pharmpy phenobarbital example)
  • Eleveld/Wahlquist public propofol data
  • ggPMX Monolix theophylline example
  • Asiimwe-style correlated-covariate simulation
  • Shap-Cov-style collinear simulation
  • optional Kekic public synthetic scenarios when available locally

Current agreement snapshot for the benchmark-backed default workflow:

Dataset Agreement Source / data
pheno Exact Pharmpy example model/data
eleveld_union Exact Wahlquist public propofol benchmark repo
ggpmx_theophylline Exact ggPMX theophylline example files
high_shrinkage_user_input Exact Generated in package
age_pma_distinct Exact Generated in package
interaction_xor_screening Exact Generated in package
asiimwe_correlated_small_n Partial Generated in package
shapcov_collinear Partial Generated in package

The current fixed benchmark suite shows exact agreement on the real/public PK cases and targeted shrinkage, proxy, and interaction checks, with remaining errors concentrated in the hardest collinearity-heavy synthetic scenarios.

Run the benchmark suite:

PYTHONPATH=. python benchmarks/run_public_benchmarks.py --check

That command generates a reusable report bundle under benchmarks/reports/fixed_public/ by default:

  • public_benchmark_report.md
  • public_benchmark_summary.csv
  • public_benchmark_details.csv
  • public_benchmark_report.json

Use --no-report to skip artifact generation, or --report-dir <path> to write the bundle somewhere else.

Experimental Consensus

For advanced benchmarking and model-family comparison, the experimental namespace exposes a curated multi-model consensus workflow:

from pharmacoml.covselect.experimental import MultiModelConsensusScreener

report = MultiModelConsensusScreener(
    top_k=3,
    n_bootstrap=8,
    include_neural=False,
).fit(ebes, covariates)

report.consensus_covariates()
report.selection_frequency_table()
report.compare_with_hybrid(ebes, covariates)

Documentation

Rendered docs are available via GitHub Pages:

Methodological References

The default hybrid workflow implements and combines approaches described in recent pharmacometric ML literature on covariate screening, including Sibieude et al. (2021), Asiimwe et al. (2024), Brooks et al. (2025), Karlsen et al. (2025), and Kekic et al. (2026). The broader package also includes additional experimental screening and benchmarking capabilities.

How to Cite

If you use pharmacoml in your work, please cite the software repository. GitHub will also expose citation metadata directly via the repository citation panel.

Suggested citation:

Rani S. pharmacoml: Benchmark-backed hybrid AI/ML covariate screening toolkit
for population PK/PD. Version 0.1.1. GitHub.
https://github.com/s-rani1/pharmacoml

When relevant, also cite the methodological papers that informed the workflow, especially Sibieude et al. (2021), Asiimwe et al. (2024), Brooks et al. (2025), Karlsen et al. (2025), and Kekic et al. (2026).

Roadmap

Potential future expansion includes:

  • backend integration for formal model-confirmation workflows such as nlmixr2 and NONMEM
  • estimation-driven SCM and backward elimination
  • simulation and reporting layers for broader pharmacometric workflows
  • possible R integration paths via subprocess-based execution or rpy2

License

MIT

About

Benchmark-backed hybrid AI/ML covariate screening toolkit for population PK/PD, combining explainable ML, penalized confirmation, and SCM-style bridging.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages