Skip to content

IGlab-VUMC/AbLangRBD1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AbLangRBD1: Contrastive-Learned Antibody Embeddings for SARS-CoV-2 RBD Binding

Epitope-aware antibody embeddings for SARS-CoV-2 RBD therapeutic discovery

Paper Model License

AbLangRBD1 generates 1536-dimensional embeddings where SARS-CoV-2 RBD antibodies targeting similar epitopes cluster together - enabling rapid epitope classification, therapeutic discovery, and vaccine analysis.

Model Description

AbLangRBD1 is a fine-tuned antibody language model specifically designed for SARS-CoV-2 RBD-binding antibodies. Using contrastive learning on paired heavy and light chain sequences, the model learns epitope-specific representations that enable:

  • RBD Epitope Classification: Compare antibodies against reference databases
  • Therapeutic Discovery: Find antibodies with similar epitope specificity
  • Vaccine Analysis: Analyze repertoire shifts after RBD vaccination
  • Cross-reactivity Studies: Identify broadly neutralizing antibodies

Architecture

Heavy Chain Seq → [AbLang Heavy] → 768-dim → |
                                              | → [Concatenate] → [Mixer Network] → 1536-dim Paired Embedding
Light Chain Seq → [AbLang Light] → 768-dim → |

Quick Start

Model Access

All model components are hosted on HuggingFace Hub:

clint-holt/AbLangRBD1

# Clone this repository for inference code
git clone https://github.com/Clint-Holt/AbLangRBD1.git
cd AbLangRBD1

# Install dependencies
pip install torch pandas transformers safetensors huggingface_hub

# Run inference examples
cd Inference
python quick_start_example.py

Basic Usage

import torch
from transformers import AutoTokenizer
from Inference.ablangpaired_model import AbLangPaired, AbLangPairedConfig
from huggingface_hub import hf_hub_download

# Load model from HuggingFace Hub
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_path = hf_hub_download(repo_id="clint-holt/AbLangRBD1", filename="model.safetensors")

config = AbLangPairedConfig(checkpoint_filename=model_path)
model = AbLangPaired(config, device).eval()

# Load tokenizers from HuggingFace Hub
heavy_tokenizer = AutoTokenizer.from_pretrained("clint-holt/AbLangRBD1", subfolder="heavy_tokenizer")
light_tokenizer = AutoTokenizer.from_pretrained("clint-holt/AbLangRBD1", subfolder="light_tokenizer")

# Your RBD antibody sequences
heavy_chain = "EVQLVESGGGFVQPGRSLRLSCAASGFIMDDYAMHWVRQAPGKGLEWVSGISWNSGTRGYADSVKGRFTVSRDNAKNSFYLQMNSLRAADTAVYYCAKDHGPWIAANGHYFDYWGQGTLVTVSS"
light_chain = "QSVLTQPPSASGTPGQRVTISCSGSKSNIGSNPVNWYQQLPGTAPKLLIYSNNERPSGVPARFSGSKSGTSASLAISGLQSEDEADYYCVTWDDSLNGWVFGGGTKLTVL"

# Generate embedding
h_tokens = heavy_tokenizer(" ".join(heavy_chain), return_tensors="pt")
l_tokens = light_tokenizer(" ".join(light_chain), return_tensors="pt")

with torch.no_grad():
    embedding = model(
        h_input_ids=h_tokens['input_ids'].to(device),
        h_attention_mask=h_tokens['attention_mask'].to(device),
        l_input_ids=l_tokens['input_ids'].to(device),
        l_attention_mask=l_tokens['attention_mask'].to(device)
    )

print(f"Generated embedding shape: {embedding.shape}")  # (1, 1536)

📁 Repository Structure

AbLangRBD1/
├── Inference/                          # 🚀 Main inference code
│   ├── ablangpaired_model.py          # Core model implementation
│   ├── quick_start_example.py         # Command-line example
│   ├── rbd_inference_examples.ipynb   # Comprehensive Jupyter examples
│   ├── requirements.txt               # Python dependencies
│   └── README.md                      # Inference documentation
├── README.md                          # This file
├── LICENSE                           # MIT license
└── CLAUDE.md                         # Development notes

Training Data

  • Source: 3,195 SARS-CoV-2 RBD-binding antibodies from deep mutational scanning studies
  • References: Cao et al. 2023, Cao et al. 2022
  • Selection: 3,093 antibodies with confirmed binding to SARS-CoV-2 index strain
  • Data Splits: Clone-group aware splitting (80% train, 10% validation, 10% test)

Use Cases

1. RBD Epitope Classification

Compare antibodies with unknown epitopes against reference databases to predict epitope class.

2. Therapeutic Discovery

Search large antibody databases to find candidates targeting specific RBD epitopes.

3. Vaccine Analysis

Analyze B cell repertoire shifts following RBD vaccination by comparing pre/post vaccination samples.

4. Cross-reactivity Prediction

Identify antibodies likely to cross-react with SARS-CoV-2 variants or related coronaviruses.

🔗 Resources

Citation

If you use AbLangRBD1 in your research, please cite our paper:

@article{Holt2025.02.25.640114,
    author = {Holt, Clinton M. and Janke, Alexis K. and Amlashi, Parastoo and Jamieson, Parker J. and Marinov, Toma M. and Georgiev, Ivelin S.},
    title = {Contrastive Learning Enables Epitope Overlap Predictions for Targeted Antibody Discovery},
    elocation-id = {2025.02.25.640114},
    year = {2025},
    doi = {10.1101/2025.02.25.640114},
    publisher = {Cold Spring Harbor Laboratory},
    URL = {https://www.biorxiv.org/content/early/2025/04/01/2025.02.25.640114},
    eprint = {https://www.biorxiv.org/content/early/2025/04/01/2025.02.25.640114.full.pdf},
    journal = {bioRxiv}
}

Institution

Vanderbilt Center for Antibody Therapeutics Vanderbilt University Medical Center, Nashville, TN 37232, USA

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors