Retraining of ProteinMPNN model specifically with acid-stable structures and sequences
Trained using the HyperMPNN training scripts. Also used most of the same sequence selection logic (ie same clustering and quality cutoffs).
Structure Selection
Sequences were filtered by organism, selecting for acidophiles, and then for the prescence of a secretion tag. Since organisms can maintain a pH inside the cell different than that outside, proteins with secretion tags from acidophiles were mostly likely to be present in low pH environments. After clustering at 50% sequence identity, AF2 structures were gathered and filtered by quality (>70% plddt).
Training
21129 total sequence/structures used. 80-10-10 training-validation-test split
Testing Results
Note: Has not been experimentally tested yet. Please try!
Generated sequences were folded with high confidence with AF2. Amino acid compositions are distinct from ProteinMPNN and HyperMPNN distributions. More analysis to follow
Use
Point ProteinMPNN to the acidompnn .pt file.