Skip to content

zaixizhang/FoldMark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

Hugging Face Demo Paper Twitter

📰 Media Coverage

🌟 Try Our Demo!

We've created an interactive demo on Hugging Face Spaces where you can:

  • Input protein sequences and get watermarked structure predictions
  • Compare watermarked vs. non-watermarked structures
  • Visualize the differences in 3D
  • Pretrained Checkpoints and Inference code

Try the Demo →

🚀 Overview

FoldMark is a first-of-its-kind watermarking strategy designed to provide essential biosecurity safeguards for generative protein models against dual-use risks. It:

  • Balances Performance and Quality: Employs distributional and evolutionary principles to embed watermarks while maintaining high-fidelity protein structures.
  • High Bit Accuracy: Achieves over 95% watermark bit accuracy at 32 bits with minimal impact on structural integrity (maintaining >0.9 scTM scores).
  • Broad Compatibility: Works seamlessly with leading models, including AlphaFold3, ESMFold, RFDiffusion, and RFDiffusionAA.
  • Robust User Tracing: Capable of successfully tracing the source of a generated protein back to one of up to 1 million users.
  • Wet Lab Validated: Successfully tested on redesigned EGFP and CRISPR-Cas13, which showed wildtype-level function (98% fluorescence, 95% editing efficiency) and >90% watermark detection, proving its practical utility.

📊 Results

Structure Prediction with Watermarking

De Novo Protein Structure Design with Watermarking

🛠️ Installation

# Create and activate conda environment
conda env create -f foldmark.yml
conda activate fm

# Install torch-scatter
pip install torch-scatter -f https://data.pyg.org/whl/torch-2.0.0+cu117.html

# Install local package
pip install -e .

📊 Training Pipeline

Data Setup

  1. Download preprocessed SCOPe dataset (~280MB): Download Link
  2. Extract the data:
    tar -xvzf preprocessed_scope.tar.gz
    rm preprocessed_scope.tar.gz

Training Steps

  1. Pretrain the model:
    python -W ignore experiments/pretrain.py
  2. Finetune with watermarking:
    python -W ignore experiments/finetune.py

🔬 Wet Lab Verifications on GFP and Cas13 Redesign

📖 Reproduction Tutorials

Step-by-step scripts to reproduce the eGFP and Cas13 wet-lab watermarking experiments are provided in tutorials/.

Each protein has three scripts covering the full pipeline:

Step Script Description
1 step1_watermarked_structure_prediction.py Run FoldMark-Protenix to obtain a watermarked backbone
2 step2_proteinmpnn_inverse_folding.py Partial inverse folding with ProteinMPNN (100 sequences, T = 0.1)
3 step3_esm2_ranking.py Score with ESM2-650M and export top constructs for synthesis

eGFP (PDB 4EUL) — design regions: residues 15–40 and 160–190 (surface-exposed loops). Chromophore residues and proton-wire residues are fixed. 12 constructs synthesised; 98% fluorescence and >90% watermark bit accuracy.

Cas13 (PDB 7VTI, apo/inactive state) — design region: residues 258–325 (helical lid). HEPN catalytic dyads are fixed. Top constructs showed 95% editing efficiency and over 90% watermark bit accuracy.

# Quick start (run from the FoldMark HuggingFace Space root after installation)
python tutorials/egfp/step1_watermarked_structure_prediction.py
python tutorials/egfp/step2_proteinmpnn_inverse_folding.py --mpnn_dir ./ProteinMPNN
python tutorials/egfp/step3_esm2_ranking.py

Quick start on HuggingFace

The demo has three tabs: Structure Predictor (JSON Upload), Structure Predictor (Manual Input), and Watermark Detector.

Upload JSON and check Add Watermark 3D output: gray = unwatermarked, cyan = watermarked
1. Upload a sequence JSON (or type manually) and check Add Watermark 2. View the overlaid 3D result and download the watermarked CIF

Upload the downloaded CIF to the Watermark Detector tab to verify the embedded signal.

See the full step-by-step guide with all screenshots: tutorials/huggingface_tutorial/

📝 Citation

If you find this work helpful, please cite our paper:

@article{zhang2024foldmark,
  title={FoldMark: Protecting Protein Generative Models with Watermarking},
  author={Zhang, Zaixi and Jin, Ruofan and Fu, Kaidi and Cong, Le and Zitnik, Marinka and Wang, Mengdi},
  journal={bioRxiv},
  pages={2024--10},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

🙏 Acknowledgments

We thank the following open-source projects for their valuable contributions:

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Implementation of FoldMark: Safeguarding Protein Structure Generative Models with Distributional and Evolutionary Watermarking

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages