diff --git a/README.md b/README.md index ac71954..feb0f90 100644 --- a/README.md +++ b/README.md @@ -8,6 +8,12 @@ DRGS-Net implements a hybrid molecular representation model that concatenates: The combined embedding is passed to a small prediction head for downstream molecular property prediction (classification or regression). +## TL;DR (Quick usage) +1) Install dependencies (see "Quick start" below for full setup commands). +2) Copy the default config: `cp DRGS-Net.yaml config_concatenate.yaml` +3) Edit `config_concatenate.yaml` with your dataset paths and ChemBERTa checkpoint. +4) Run: `python DRGS-Net_finetune.py` + This repository contains training/finetuning scripts, dataset wrappers, model definitions, and utilities. The hybrid model is implemented in `models/concatenate_model.py` (class `HybridModel`). The finetuning orchestration is in `DRGS-Net_finetune.py` which expects a configuration file named `config_concatenate.yaml` by default — if you only have `DRGS-Net.yaml`, copy/rename it to `config_concatenate.yaml` before running (see "Quick start"). ## Methodology ![DRGS-Net Architecture](./fig/Methodology.png) @@ -50,7 +56,14 @@ See the model card: https://huggingface.co/DeepChem/ChemBERTa-77M-MLM ## Quick start -1) Prepare config file +1) Download the repository + +```bash +git clone https://github.com/thkim-01/DRGS-Net.git +cd DRGS-Net +``` + +2) Prepare config file The finetune script (`DRGS-Net_finetune.py`) by default loads `config_concatenate.yaml`. If you only have `DRGS-Net.yaml`, create a copy with the expected name: @@ -60,7 +73,7 @@ cp DRGS-Net.yaml config_concatenate.yaml Edit the YAML to set `task_name`, dataset paths, `fine_tune_from` (pretrained GNN checkpoint folder under `./ckpt/`) and `hybrid_specific.chemberta_model_name` (local path or Hugging Face model id). -2) Create environment & install dependencies (suggested) +3) Create environment & install dependencies (suggested) This project requires PyTorch, HuggingFace Transformers, and optional packages like RDKit and NVIDIA Apex for mixed precision. @@ -80,11 +93,11 @@ pip install tensorboard scikit-learn pandas numpy tqdm pyyaml Note: Install compatible `torch-geometric` packages for your PyTorch/CUDA setup if you use PyG layers in the GNN models. -3) Prepare data +4) Prepare data Place downstream datasets under `data//` following the MoleculeNet CSV formats. The default config uses paths like `data/bbbp/BBBP.csv` etc. See `DRGS-Net_finetune.py` for dataset mapping by `task_name`. -4) Run finetuning +5) Run finetuning ```bash python DRGS-Net_finetune.py