pharmacoml

pharmacoml is a benchmark-backed hybrid AI/ML covariate screening toolkit for population PK/PD, combining explainable ML discovery, penalized confirmation, and SCM-style bridging in an estimation-tool-agnostic Python workflow.

What It Is

pharmacoml helps pharmacometricians use a hybrid AI/ML screening workflow to identify and prioritize likely covariates from subject-level EBEs or individual parameters before formal model confirmation. It is designed to work with outputs from NONMEM, nlmixr2, Monolix, Pumas, or similar mixed-effects workflows.

The current release is evaluated against a fixed public benchmark suite that includes real public PK examples and paper-style benchmark scenarios.

What It Is Not

pharmacoml is not a replacement for final NLME estimation, full model search, or pharmacometric confirmation in the current release. It is a hybrid AI/ML covariate screening and preselection tool designed to reduce search space before SCM, backward elimination, or final model fitting.

Why It Is Different

Uses a hybrid AI/ML screening workflow that combines explainable ML discovery, penalized confirmation, and SCM-style bridging instead of relying on a single method.
Works with EBEs or individual parameters from any solver, including NONMEM, nlmixr2, Monolix, and Pumas, so screening is not tied to a single estimation engine.
Supports many screening backends, including explainable boosting, AALASSO, Stochastic Gates (STG), and an SCM-style bridge, rather than relying on a single screening model.
Includes pharmacometric screening features such as shrinkage-aware logic, biology-aware proxy preservation, and optional interaction screening.
Ships with a public benchmark suite, pinned baselines, and generated benchmark reports so workflow changes can be evaluated against fixed reference cases.

Installation

From PyPI:

pip install pharmacoml

For development:

git clone https://github.com/s-rani1/pharmacoml.git
cd pharmacoml
pip install -e ".[dev]"

Optional extras:

pip install -e ".[dev,dl,symbolic]"

Quick Start

import pandas as pd
from pharmacoml.covselect import HybridScreener

ebes = pd.read_csv("individual_parameters.csv")
covariates = pd.read_csv("covariates.csv")

report = HybridScreener(
    include_scm=True,
).fit(
    ebes=ebes,
    covariates=covariates,
    parameter_shrinkage={"CL": 0.12, "V": 0.28},
)

report.confirmed_covariates()   # recommended daily-use answer
report.candidate_covariates()   # shortlist to carry forward
report.core_covariates()        # strongest ML-supported signals
report.proxy_groups()           # correlated alternatives
print(report.to_nonmem_candidates())

For reproducible runs, set a fixed random_state. In the current release, the default hybrid workflow is stable across repeated runs when the data, settings, and random seed are unchanged. Experimental or more stochastic paths may vary more across runs and environments.

Example confirmed_covariates() output:

  parameter covariate functional_form confirmation_status
0        CL        WT           power                 scm
1         V        WT           power                 scm

Typical Outputs

confirmed_covariates(): compact answer after SCM-style confirmation
candidate_covariates(): practical shortlist for downstream PMx confirmation
core_covariates(): strongest ML-supported signals
proxy_groups(): correlated or overlapping covariate groups
interaction_covariates(): screened interactions when enabled
to_nonmem_candidates(): export-ready candidate block for downstream workflows

How to Read the Outputs

confirmed_covariates(): start here for the most compact daily-use answer. These are the covariates that survive the package's confirmation layer and are the clearest candidates to carry forward.
candidate_covariates(): use this as the practical shortlist for formal PMx confirmation. It is intentionally broader than confirmed_covariates() and is often the right input to SCM or backward elimination.
core_covariates(): the strongest ML-supported signals before confirmation. This is useful when you want to inspect what the AI/ML layer found most strongly, even if not every signal is retained in the final confirmed set.
proxy_groups(): review this whenever correlated covariates are plausible. It shows which variables are acting as correlated alternatives so you can make a pharmacometrically sensible choice downstream.
interaction_covariates(): only relevant when interaction screening is enabled. These are pairwise interaction terms that survived the screening workflow.
to_nonmem_candidates(): use this when you want a direct candidate block to carry into a downstream modeling workflow.

For most users, the practical reading order is:

confirmed_covariates()
candidate_covariates()
proxy_groups()
to_nonmem_candidates()

For reproducibility, keep random_state fixed when comparing runs or benchmarking workflow changes.

Benchmarks

pharmacoml includes a fixed public benchmark suite for release calibration:

pheno (Pharmpy phenobarbital example)
Eleveld/Wahlquist public propofol data
ggPMX Monolix theophylline example
Asiimwe-style correlated-covariate simulation
Shap-Cov-style collinear simulation
optional Kekic public synthetic scenarios when available locally

Current agreement snapshot for the benchmark-backed default workflow:

Dataset	Agreement	Source / data
`pheno`	Exact	Pharmpy example model/data
`eleveld_union`	Exact	Wahlquist public propofol benchmark repo
`ggpmx_theophylline`	Exact	ggPMX theophylline example files
`high_shrinkage_user_input`	Exact	Generated in package
`age_pma_distinct`	Exact	Generated in package
`interaction_xor_screening`	Exact	Generated in package
`asiimwe_correlated_small_n`	Partial	Generated in package
`shapcov_collinear`	Partial	Generated in package

The current fixed benchmark suite shows exact agreement on the real/public PK cases and targeted shrinkage, proxy, and interaction checks, with remaining errors concentrated in the hardest collinearity-heavy synthetic scenarios.

Run the benchmark suite:

PYTHONPATH=. python benchmarks/run_public_benchmarks.py --check

That command generates a reusable report bundle under benchmarks/reports/fixed_public/ by default:

public_benchmark_report.md
public_benchmark_summary.csv
public_benchmark_details.csv
public_benchmark_report.json

Use --no-report to skip artifact generation, or --report-dir <path> to write the bundle somewhere else.

Experimental Consensus

For advanced benchmarking and model-family comparison, the experimental namespace exposes a curated multi-model consensus workflow:

from pharmacoml.covselect.experimental import MultiModelConsensusScreener

report = MultiModelConsensusScreener(
    top_k=3,
    n_bootstrap=8,
    include_neural=False,
).fit(ebes, covariates)

report.consensus_covariates()
report.selection_frequency_table()
report.compare_with_hybrid(ebes, covariates)

Documentation

Rendered docs are available via GitHub Pages:

Methodological References

The default hybrid workflow implements and combines approaches described in recent pharmacometric ML literature on covariate screening, including Sibieude et al. (2021), Asiimwe et al. (2024), Brooks et al. (2025), Karlsen et al. (2025), and Kekic et al. (2026). The broader package also includes additional experimental screening and benchmarking capabilities.

How to Cite

If you use pharmacoml in your work, please cite the software repository. GitHub will also expose citation metadata directly via the repository citation panel.

Suggested citation:

Rani S. pharmacoml: Benchmark-backed hybrid AI/ML covariate screening toolkit
for population PK/PD. Version 0.1.1. GitHub.
https://github.com/s-rani1/pharmacoml

When relevant, also cite the methodological papers that informed the workflow, especially Sibieude et al. (2021), Asiimwe et al. (2024), Brooks et al. (2025), Karlsen et al. (2025), and Kekic et al. (2026).

Roadmap

Potential future expansion includes:

backend integration for formal model-confirmation workflows such as nlmixr2 and NONMEM
estimation-driven SCM and backward elimination
simulation and reporting layers for broader pharmacometric workflows
possible R integration paths via subprocess-based execution or rpy2

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
.mplconfig		.mplconfig
benchmarks		benchmarks
docs		docs
examples		examples
pharmacoml		pharmacoml
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pharmacoml

What It Is

What It Is Not

Why It Is Different

Installation

Quick Start

Typical Outputs

How to Read the Outputs

Benchmarks

Experimental Consensus

Documentation

Methodological References

How to Cite

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pharmacoml

What It Is

What It Is Not

Why It Is Different

Installation

Quick Start

Typical Outputs

How to Read the Outputs

Benchmarks

Experimental Consensus

Documentation

Methodological References

How to Cite

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages