pharmacoml is a benchmark-backed hybrid AI/ML covariate screening toolkit for population PK/PD, combining explainable ML discovery, penalized confirmation, and SCM-style bridging in an estimation-tool-agnostic Python workflow.
pharmacoml helps pharmacometricians use a hybrid AI/ML screening workflow
to identify and prioritize likely covariates from subject-level EBEs or
individual parameters before formal model confirmation. It is designed to work
with outputs from NONMEM, nlmixr2, Monolix, Pumas, or similar
mixed-effects workflows.
The current release is evaluated against a fixed public benchmark suite that includes real public PK examples and paper-style benchmark scenarios.
pharmacoml is not a replacement for final NLME estimation, full model
search, or pharmacometric confirmation in the current release. It is a
hybrid AI/ML covariate screening and preselection tool designed to reduce
search space before SCM, backward elimination, or final model fitting.
- Uses a hybrid AI/ML screening workflow that combines explainable ML discovery, penalized confirmation, and SCM-style bridging instead of relying on a single method.
- Works with EBEs or individual parameters from any solver, including
NONMEM,nlmixr2,Monolix, andPumas, so screening is not tied to a single estimation engine. - Supports many screening backends, including explainable boosting,
AALASSO,Stochastic Gates (STG), and an SCM-style bridge, rather than relying on a single screening model. - Includes pharmacometric screening features such as shrinkage-aware logic, biology-aware proxy preservation, and optional interaction screening.
- Ships with a public benchmark suite, pinned baselines, and generated benchmark reports so workflow changes can be evaluated against fixed reference cases.
From PyPI:
pip install pharmacomlFor development:
git clone https://github.com/s-rani1/pharmacoml.git
cd pharmacoml
pip install -e ".[dev]"Optional extras:
pip install -e ".[dev,dl,symbolic]"import pandas as pd
from pharmacoml.covselect import HybridScreener
ebes = pd.read_csv("individual_parameters.csv")
covariates = pd.read_csv("covariates.csv")
report = HybridScreener(
include_scm=True,
).fit(
ebes=ebes,
covariates=covariates,
parameter_shrinkage={"CL": 0.12, "V": 0.28},
)
report.confirmed_covariates() # recommended daily-use answer
report.candidate_covariates() # shortlist to carry forward
report.core_covariates() # strongest ML-supported signals
report.proxy_groups() # correlated alternatives
print(report.to_nonmem_candidates())For reproducible runs, set a fixed random_state. In the current release,
the default hybrid workflow is stable across repeated runs when the data,
settings, and random seed are unchanged. Experimental or more stochastic paths
may vary more across runs and environments.
Example confirmed_covariates() output:
parameter covariate functional_form confirmation_status
0 CL WT power scm
1 V WT power scm
confirmed_covariates(): compact answer after SCM-style confirmationcandidate_covariates(): practical shortlist for downstream PMx confirmationcore_covariates(): strongest ML-supported signalsproxy_groups(): correlated or overlapping covariate groupsinteraction_covariates(): screened interactions when enabledto_nonmem_candidates(): export-ready candidate block for downstream workflows
confirmed_covariates(): start here for the most compact daily-use answer. These are the covariates that survive the package's confirmation layer and are the clearest candidates to carry forward.candidate_covariates(): use this as the practical shortlist for formal PMx confirmation. It is intentionally broader thanconfirmed_covariates()and is often the right input to SCM or backward elimination.core_covariates(): the strongest ML-supported signals before confirmation. This is useful when you want to inspect what the AI/ML layer found most strongly, even if not every signal is retained in the final confirmed set.proxy_groups(): review this whenever correlated covariates are plausible. It shows which variables are acting as correlated alternatives so you can make a pharmacometrically sensible choice downstream.interaction_covariates(): only relevant when interaction screening is enabled. These are pairwise interaction terms that survived the screening workflow.to_nonmem_candidates(): use this when you want a direct candidate block to carry into a downstream modeling workflow.
For most users, the practical reading order is:
confirmed_covariates()candidate_covariates()proxy_groups()to_nonmem_candidates()
For reproducibility, keep random_state fixed when comparing runs or
benchmarking workflow changes.
pharmacoml includes a fixed public benchmark suite for release calibration:
pheno(Pharmpy phenobarbital example)Eleveld/Wahlquistpublic propofol dataggPMXMonolix theophylline exampleAsiimwe-stylecorrelated-covariate simulationShap-Cov-stylecollinear simulation- optional
Kekicpublic synthetic scenarios when available locally
Current agreement snapshot for the benchmark-backed default workflow:
| Dataset | Agreement | Source / data |
|---|---|---|
pheno |
Exact | Pharmpy example model/data |
eleveld_union |
Exact | Wahlquist public propofol benchmark repo |
ggpmx_theophylline |
Exact | ggPMX theophylline example files |
high_shrinkage_user_input |
Exact | Generated in package |
age_pma_distinct |
Exact | Generated in package |
interaction_xor_screening |
Exact | Generated in package |
asiimwe_correlated_small_n |
Partial | Generated in package |
shapcov_collinear |
Partial | Generated in package |
The current fixed benchmark suite shows exact agreement on the real/public PK cases and targeted shrinkage, proxy, and interaction checks, with remaining errors concentrated in the hardest collinearity-heavy synthetic scenarios.
Run the benchmark suite:
PYTHONPATH=. python benchmarks/run_public_benchmarks.py --checkThat command generates a reusable report bundle under
benchmarks/reports/fixed_public/ by default:
public_benchmark_report.mdpublic_benchmark_summary.csvpublic_benchmark_details.csvpublic_benchmark_report.json
Use --no-report to skip artifact generation, or --report-dir <path> to
write the bundle somewhere else.
For advanced benchmarking and model-family comparison, the experimental namespace exposes a curated multi-model consensus workflow:
from pharmacoml.covselect.experimental import MultiModelConsensusScreener
report = MultiModelConsensusScreener(
top_k=3,
n_bootstrap=8,
include_neural=False,
).fit(ebes, covariates)
report.consensus_covariates()
report.selection_frequency_table()
report.compare_with_hybrid(ebes, covariates)Rendered docs are available via GitHub Pages:
The default hybrid workflow implements and combines approaches described in recent pharmacometric ML literature on covariate screening, including Sibieude et al. (2021), Asiimwe et al. (2024), Brooks et al. (2025), Karlsen et al. (2025), and Kekic et al. (2026). The broader package also includes additional experimental screening and benchmarking capabilities.
If you use pharmacoml in your work, please cite the software repository.
GitHub will also expose citation metadata directly via the repository citation
panel.
Suggested citation:
Rani S. pharmacoml: Benchmark-backed hybrid AI/ML covariate screening toolkit
for population PK/PD. Version 0.1.1. GitHub.
https://github.com/s-rani1/pharmacoml
When relevant, also cite the methodological papers that informed the workflow, especially Sibieude et al. (2021), Asiimwe et al. (2024), Brooks et al. (2025), Karlsen et al. (2025), and Kekic et al. (2026).
Potential future expansion includes:
- backend integration for formal model-confirmation workflows such as
nlmixr2andNONMEM - estimation-driven SCM and backward elimination
- simulation and reporting layers for broader pharmacometric workflows
- possible R integration paths via subprocess-based execution or
rpy2
MIT