TabPFN-TDC

TabPFNv2 predicting Absorption, Distribution, Metabolism, Excretion, and Toxicitys (ADMET) of Drugs in the Therapeutic Data Challenge (TDC).

This work make use of the TabPFNv2 tabular foundation model using the 217 RDKit molecular descriptors as features.

For classification tasks, the fine-tuned version of TabPFNv2 (📝) is used, which was trained with real datasets from internet after the synthetic dataset pretraining.

See the ADMET benchmark for more details about the challenge.

Abstract

Tabular data is one of the most widely used formats in bioinformatics research. Therefore, improving algorithmic baselines for such data has important implications for a wide range of applications. One of these critical applications is the prediction of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties of drugs, a key step in the early stages of drug development. Failures due to poor pharmacokinetic profiles remain a leading cause of attrition in clinical trials, highlighting the need for reliable predictive tools. In recent years, machine learning has emerged as a powerful approach to model complex ADMET behaviors, enabling faster, more cost-effective, and more ethical drug screening pipelines. While some current state-of-the-art approaches, such as MiniMol or MolE, leverage specialized models pretrained on millions of drug-like molecules, Gradient Boosted Decision Trees algorithms like XGBoost continue to serve as strong baselines for many general-purpose tasks. The objective of this study is to explore the use of novel tabular foundation models as a new baseline for tabular data in bioinformatics, with a focus on ADMET drug prediction. To this end, we used TabPFNv2, an In-Context Learning model based on transformers that was pretrained on synthetic data. For evaluation, we employed the Therapeutic Data Commons benchmark, comprising 22 datasets that include both regression and classification tasks, and extracted the widely used set of 217 RDKit molecular descriptors. This generic algorithm outperforms XGBoost in 19 out of 22 datasets and surpasses MiniMol in 9 out of 22, despite not relying on any prior, drug-specific knowledge. Notably, TabPFNv2 achieves the top rank in 3 tasks, surpassing specialized methods in the field. These results suggest that TabPFNv2 is a promising baseline for drug prediction, with potential applications in other bioinformatics tasks, including clinical and small omics datasets that meet TabPFNv2’s size constraints. Furthermore, its independence from domain-specific pretraining and hyperparameter tuning enhances its applicability for non-expert practitioners.

Installation

conda create --prefix ./env python=3.12
conda activate ./env
pip install -r requirements.txt

Usage

python tdc_submission.py

Results

Here is the Critical Difference diagram showing the significant differences between using TabPFNv2 and XGBoost with RDKit molecular descriptors:

Citation

If you find this work useful, please consider citing:

@inproceedings{ipas2025exploring,
  title={Exploring TabPFNv2 as a Novel Baseline for ADMET Prediction in Drug Discovery},
  author={Ipas, Oroel and Su{\'a}rez Mart{\'i}n, Ignacio and Gomez-Trenado, Guillermo and Triguero, Isaac and Romero-Zaliz, Roc{\'\i}o},
  booktitle={Proceedings of the Brazilian Symposium on Bioinformatics (BSB 2025)},
  year={2025},
  note={Poster presentation}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
results		results
.gitignore		.gitignore
README.md		README.md
bsb_poster_references.md		bsb_poster_references.md
poster BSB.pdf		poster BSB.pdf
requirements.txt		requirements.txt
show_results.ipynb		show_results.ipynb
tdc_bechmarks_july2025.py		tdc_bechmarks_july2025.py
tdc_submission.py		tdc_submission.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TabPFN-TDC

Abstract

Installation

Usage

Results

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TabPFN-TDC

Abstract

Installation

Usage

Results

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages