FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking
- Science: Built-in safeguards might stop AI from designing bioweapons
- Nature Biotechnology: Watermarking generative AI for protein structure
- Princeton AI Lab: Deep Dive Series: Building Biosecurity Safeguards into AI for Science
We've created an interactive demo on Hugging Face Spaces where you can:
- Input protein sequences and get watermarked structure predictions
- Compare watermarked vs. non-watermarked structures
- Visualize the differences in 3D
- Pretrained Checkpoints and Inference code
FoldMark is a first-of-its-kind watermarking strategy designed to provide essential biosecurity safeguards for generative protein models against dual-use risks. It:
- Balances Performance and Quality: Employs distributional and evolutionary principles to embed watermarks while maintaining high-fidelity protein structures.
- High Bit Accuracy: Achieves over 95% watermark bit accuracy at 32 bits with minimal impact on structural integrity (maintaining >0.9 scTM scores).
- Broad Compatibility: Works seamlessly with leading models, including AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA.
- Robust User Tracing: Capable of successfully tracing the source of a generated protein back to one of up to 1 million users.
- Wet Lab Validated: Successfully tested on redesigned EGFP and CRISPR-Cas13, which showed wildtype-level function (98% fluorescence, 95% editing efficiency) and >90% watermark detection, proving its practical utility.
# Create and activate conda environment
conda env create -f foldmark.yml
conda activate fm
# Install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html
# Install local package
pip install -e .- Download preprocessed SCOPe dataset (~280MB): Download Link
- Extract the data:
tar -xvzf preprocessed_scope.tar.gz rm preprocessed_scope.tar.gz
- Pretrain the model:
python -W ignore experiments/pretrain.py
- Finetune with watermarking:
python -W ignore experiments/finetune.py
Step-by-step scripts to reproduce the eGFP and Cas13 wet-lab watermarking
experiments are provided in tutorials/.
Each protein has three scripts covering the full pipeline:
| Step | Script | Description |
|---|---|---|
| 1 | step1_watermarked_structure_prediction.py |
Run FoldMark-Protenix to obtain a watermarked backbone |
| 2 | step2_proteinmpnn_inverse_folding.py |
Partial inverse folding with ProteinMPNN (100 sequences, T = 0.1) |
| 3 | step3_esm2_ranking.py |
Score with ESM2-650M and export top constructs for synthesis |
eGFP (PDB 4EUL) — design regions: residues 15–40 and 160–190 (surface-exposed loops). Chromophore residues and proton-wire residues are fixed. 12 constructs synthesised; 98% fluorescence and >90% watermark bit accuracy.
Cas13 (PDB 7VTI, apo/inactive state) — design region: residues 258–325 (helical lid). HEPN catalytic dyads are fixed. Top constructs showed 95% editing efficiency and over 90% watermark bit accuracy.
# Quick start (run from the FoldMark HuggingFace Space root after installation)
python tutorials/egfp/step1_watermarked_structure_prediction.py
python tutorials/egfp/step2_proteinmpnn_inverse_folding.py --mpnn_dir ./ProteinMPNN
python tutorials/egfp/step3_esm2_ranking.pyThe demo has three tabs: Structure Predictor (JSON Upload), Structure Predictor (Manual Input), and Watermark Detector.
![]() |
![]() |
| 1. Upload a sequence JSON (or type manually) and check Add Watermark | 2. View the overlaid 3D result and download the watermarked CIF |
Upload the downloaded CIF to the Watermark Detector tab to verify the embedded signal.
See the full step-by-step guide with all screenshots: tutorials/huggingface_tutorial/
If you find this work helpful, please cite our paper:
@article{zhang2024foldmark,
title={FoldMark: Protecting Protein Generative Models with Watermarking},
author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
journal={bioRxiv},
pages={2024--10},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}We thank the following open-source projects for their valuable contributions:
This project is licensed under the MIT License - see the LICENSE file for details.





