Skip to content

depasquale-lab/olfactory_foundation_models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Olfactory Foundation Models

This is a repository for the benchmarking section in this paper. The LORAX repo can be found here.

benchmarking models overview

NOTE: MPL <-> MP and MPP <-> ProSmith are used interchangably.

Install

Environments

There are two separate environments to run models: one for preprocessing, MO, and MPL models and one for the MPP model.

conda env create -f ofm.yml  # preprocessing, MO, and MPL models
conda env create -f ofm_MPP.yml  # MPP model

Datasets

The datasets used in this paper can be downloaded here.

These files also contain pre-generated embeddings, the splits used, and a pretrained MPP model. Please unzip and put the data/ and BindingDB/ folders in the top level directory.

How to generate all embeddings

All embeddings are already provided in the data/ folder, but if you would like to regenerate them you can run the following command.

conda activate ofm
pip install -e .
python preprocess/generate_embeddings.py

Running the models

MO and MPL models

The MO and MPL models can be run with the following command.

conda deactivate
conda activate ofm
python model/MO/molecule_model.py # MO model
python model/MPL/combo_model.py # MPL model

Both of these scripts will automatically plot and tabulate results. The tabulated results will populate in a results/ folder.

MPP model

This model is a little more nuanced to run as it is an adaptation of the ProSmith model https://github.com/AlexanderKroll/ProSmith.

First you need to run the transformer model.

conda activate ofm_MPP
pip install -e .
python model/MPP/training.py --train_dir data/CC/rand_splits --embed_path data/CC/embeddings/featurized_mols/MolT5.pkl --save_model_path results/CC/saved_models --binary_task False --log_name CC_MOLT5 --pretrained_model BindingDB/saved_model/pretraining_IC50_6gpus_bs144_1.5e-05_layers6.txt.pkl
  • --train_dir - is the dataset that you want to train the model on.
  • --emb_path - is the molecular embedding you want to test.
  • --save_model_path - is where you would like the output of the model to save.
  • --binary_task - modulates whether the model is predicting a binary task (Lalis et. al.) or not (Carey et. al.)
  • --log_name - The name of a log.txt file that is generated with the training progress of the model.
  • --pretrained_model - The location of a pretrained transformer that can be used in lieu of randomly initialized weights. (The model at BindingDB/saved_model/pretraining_IC50_6gpus_bs144_1.5e-05_layers6.txt.pkl is pretrained on the BindingDB dataset and is the one that the ProSmith paper uses for its' pretraining)

Once the transformer is trained, the results will populate in results/dataset/saved_models. There will be a folder for each split and within the split folder, there will be a folder with the molecule embedding of choice that will be populated with the transformer weights and the log.txt file.

After the transformer is trained, to train the gradient boosting part of the model, run the following command.

python model/MPP/training_GB.py --train_dir data/CC/rand_splits --embed_path data/CC/embeddings/featurized_mols/MolT5.pkl --num_iter 500 --log_name CC_MOLT5_GB --binary_task False
  • --train_dir - is the dataset that you want to train the model on.
  • --emb_path - is the molecular embedding you want to test.
  • --num_iter - the amount of training iterations to run the gradient boosting model for.
  • --log_name - The name of a log.txt file that is generated with the training progress of the model.
  • --binary_task - modulates whether the model is predicting a binary task (Lalis et. al.) or not (Carey et. al.)

This will populate results in the same folder as the transformer and in the log.txt file generated from running this script, you will see the results of the trained model.

About

Chemical foundation model benchmarking study for odorant-OR binding prediction.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages