FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

📰 Media Coverage

Science: Built-in safeguards might stop AI from designing bioweapons
Nature Biotechnology: Watermarking generative AI for protein structure
Princeton AI Lab: Deep Dive Series: Building Biosecurity Safeguards into AI for Science

🌟 Try Our Demo!

We've created an interactive demo on Hugging Face Spaces where you can:

Input protein sequences and get watermarked structure predictions
Compare watermarked vs. non-watermarked structures
Visualize the differences in 3D
Pretrained Checkpoints and Inference code

Try the Demo →

🚀 Overview

FoldMark is a first-of-its-kind watermarking strategy designed to provide essential biosecurity safeguards for generative protein models against dual-use risks. It:

Balances Performance and Quality: Employs distributional and evolutionary principles to embed watermarks while maintaining high-fidelity protein structures.
High Bit Accuracy: Achieves over 95% watermark bit accuracy at 32 bits with minimal impact on structural integrity (maintaining >0.9 scTM scores).
Broad Compatibility: Works seamlessly with leading models, including AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA.
Robust User Tracing: Capable of successfully tracing the source of a generated protein back to one of up to 1 million users.
Wet Lab Validated: Successfully tested on redesigned EGFP and CRISPR-Cas13, which showed wildtype-level function (98% fluorescence, 95% editing efficiency) and >90% watermark detection, proving its practical utility.

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

# Create and activate conda environment
conda env create -f foldmark.yml
conda activate fm

# Install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

# Install local package
pip install -e .

📊 Training Pipeline

Data Setup

Download preprocessed SCOPe dataset (~280MB): Download Link

Extract the data:

tar -xvzf preprocessed_scope.tar.gz
rm preprocessed_scope.tar.gz

Training Steps

Pretrain the model:

python -W ignore experiments/pretrain.py

Finetune with watermarking:

python -W ignore experiments/finetune.py

🔬 Wet Lab Verifications on GFP and Cas13 Redesign

📖 Reproduction Tutorials

Step-by-step scripts to reproduce the eGFP and Cas13 wet-lab watermarking experiments are provided in tutorials/.

Each protein has three scripts covering the full pipeline:

Step	Script	Description
1	`step1_watermarked_structure_prediction.py`	Run FoldMark-Protenix to obtain a watermarked backbone
2	`step2_proteinmpnn_inverse_folding.py`	Partial inverse folding with ProteinMPNN (100 sequences, T = 0.1)
3	`step3_esm2_ranking.py`	Score with ESM2-650M and export top constructs for synthesis

eGFP (PDB 4EUL) — design regions: residues 15–40 and 160–190 (surface-exposed loops). Chromophore residues and proton-wire residues are fixed. 12 constructs synthesised; 98% fluorescence and >90% watermark bit accuracy.

Cas13 (PDB 7VTI, apo/inactive state) — design region: residues 258–325 (helical lid). HEPN catalytic dyads are fixed. Top constructs showed 95% editing efficiency and over 90% watermark bit accuracy.

# Quick start (run from the FoldMark HuggingFace Space root after installation)
python tutorials/egfp/step1_watermarked_structure_prediction.py
python tutorials/egfp/step2_proteinmpnn_inverse_folding.py --mpnn_dir ./ProteinMPNN
python tutorials/egfp/step3_esm2_ranking.py

Quick start on HuggingFace

The demo has three tabs: Structure Predictor (JSON Upload), Structure Predictor (Manual Input), and Watermark Detector.



1. Upload a sequence JSON (or type manually) and check Add Watermark	2. View the overlaid 3D result and download the watermarked CIF

Upload the downloaded CIF to the Watermark Detector tab to verify the embedded signal.

See the full step-by-step guide with all screenshots: tutorials/huggingface_tutorial/

📝 Citation

If you find this work helpful, please cite our paper:

@article{zhang2024foldmark,
  title={FoldMark: Protecting Protein Generative Models with Watermarking},
  author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
  journal={bioRxiv},
  pages={2024--10},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

🙏 Acknowledgments

We thank the following open-source projects for their valuable contributions:

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
analysis		analysis
assets		assets
configs		configs
data		data
experiments		experiments
models		models
openfold		openfold
tutorials		tutorials
LICENSE		LICENSE
README.md		README.md
foldmark.yml		foldmark.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

📰 Media Coverage

🌟 Try Our Demo!

🚀 Overview

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

📊 Training Pipeline

Data Setup

Training Steps

🔬 Wet Lab Verifications on GFP and Cas13 Redesign

📖 Reproduction Tutorials

Quick start on HuggingFace

📝 Citation

🙏 Acknowledgments

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

📰 Media Coverage

🌟 Try Our Demo!

🚀 Overview

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

📊 Training Pipeline

Data Setup

Training Steps

🔬 Wet Lab Verifications on GFP and Cas13 Redesign

📖 Reproduction Tutorials

Quick start on HuggingFace

📝 Citation

🙏 Acknowledgments

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages