Epitope-aware antibody embeddings for SARS-CoV-2 RBD therapeutic discovery
AbLangRBD1 generates 1536-dimensional embeddings where SARS-CoV-2 RBD antibodies targeting similar epitopes cluster together - enabling rapid epitope classification, therapeutic discovery, and vaccine analysis.
AbLangRBD1 is a fine-tuned antibody language model specifically designed for SARS-CoV-2 RBD-binding antibodies. Using contrastive learning on paired heavy and light chain sequences, the model learns epitope-specific representations that enable:
- RBD Epitope Classification: Compare antibodies against reference databases
- Therapeutic Discovery: Find antibodies with similar epitope specificity
- Vaccine Analysis: Analyze repertoire shifts after RBD vaccination
- Cross-reactivity Studies: Identify broadly neutralizing antibodies
Heavy Chain Seq → [AbLang Heavy] → 768-dim → |
| → [Concatenate] → [Mixer Network] → 1536-dim Paired Embedding
Light Chain Seq → [AbLang Light] → 768-dim → |
All model components are hosted on HuggingFace Hub:
# Clone this repository for inference code
git clone https://github.com/Clint-Holt/AbLangRBD1.git
cd AbLangRBD1
# Install dependencies
pip install torch pandas transformers safetensors huggingface_hub
# Run inference examples
cd Inference
python quick_start_example.pyimport torch
from transformers import AutoTokenizer
from Inference.ablangpaired_model import AbLangPaired, AbLangPairedConfig
from huggingface_hub import hf_hub_download
# Load model from HuggingFace Hub
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model_path = hf_hub_download(repo_id="clint-holt/AbLangRBD1", filename="model.safetensors")
config = AbLangPairedConfig(checkpoint_filename=model_path)
model = AbLangPaired(config, device).eval()
# Load tokenizers from HuggingFace Hub
heavy_tokenizer = AutoTokenizer.from_pretrained("clint-holt/AbLangRBD1", subfolder="heavy_tokenizer")
light_tokenizer = AutoTokenizer.from_pretrained("clint-holt/AbLangRBD1", subfolder="light_tokenizer")
# Your RBD antibody sequences
heavy_chain = "EVQLVESGGGFVQPGRSLRLSCAASGFIMDDYAMHWVRQAPGKGLEWVSGISWNSGTRGYADSVKGRFTVSRDNAKNSFYLQMNSLRAADTAVYYCAKDHGPWIAANGHYFDYWGQGTLVTVSS"
light_chain = "QSVLTQPPSASGTPGQRVTISCSGSKSNIGSNPVNWYQQLPGTAPKLLIYSNNERPSGVPARFSGSKSGTSASLAISGLQSEDEADYYCVTWDDSLNGWVFGGGTKLTVL"
# Generate embedding
h_tokens = heavy_tokenizer(" ".join(heavy_chain), return_tensors="pt")
l_tokens = light_tokenizer(" ".join(light_chain), return_tensors="pt")
with torch.no_grad():
embedding = model(
h_input_ids=h_tokens['input_ids'].to(device),
h_attention_mask=h_tokens['attention_mask'].to(device),
l_input_ids=l_tokens['input_ids'].to(device),
l_attention_mask=l_tokens['attention_mask'].to(device)
)
print(f"Generated embedding shape: {embedding.shape}") # (1, 1536)AbLangRBD1/
├── Inference/ # 🚀 Main inference code
│ ├── ablangpaired_model.py # Core model implementation
│ ├── quick_start_example.py # Command-line example
│ ├── rbd_inference_examples.ipynb # Comprehensive Jupyter examples
│ ├── requirements.txt # Python dependencies
│ └── README.md # Inference documentation
├── README.md # This file
├── LICENSE # MIT license
└── CLAUDE.md # Development notes
- Source: 3,195 SARS-CoV-2 RBD-binding antibodies from deep mutational scanning studies
- References: Cao et al. 2023, Cao et al. 2022
- Selection: 3,093 antibodies with confirmed binding to SARS-CoV-2 index strain
- Data Splits: Clone-group aware splitting (80% train, 10% validation, 10% test)
Compare antibodies with unknown epitopes against reference databases to predict epitope class.
Search large antibody databases to find candidates targeting specific RBD epitopes.
Analyze B cell repertoire shifts following RBD vaccination by comparing pre/post vaccination samples.
Identify antibodies likely to cross-react with SARS-CoV-2 variants or related coronaviruses.
- 🤗 HuggingFace Model: clint-holt/AbLangRBD1
- 📄 Paper: bioRxiv
- 💻 Inference Code:
Inference/directory - 📓 Examples:
Inference/rbd_inference_examples.ipynb
If you use AbLangRBD1 in your research, please cite our paper:
@article{Holt2025.02.25.640114,
author = {Holt, Clinton M. and Janke, Alexis K. and Amlashi, Parastoo and Jamieson, Parker J. and Marinov, Toma M. and Georgiev, Ivelin S.},
title = {Contrastive Learning Enables Epitope Overlap Predictions for Targeted Antibody Discovery},
elocation-id = {2025.02.25.640114},
year = {2025},
doi = {10.1101/2025.02.25.640114},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/04/01/2025.02.25.640114},
eprint = {https://www.biorxiv.org/content/early/2025/04/01/2025.02.25.640114.full.pdf},
journal = {bioRxiv}
}Vanderbilt Center for Antibody Therapeutics Vanderbilt University Medical Center, Nashville, TN 37232, USA
This project is licensed under the MIT License - see the LICENSE file for details.