-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Description
I attempted to train the DeepMVP phosphorylation site prediction model, but the performance is far below the results reported in the original paper. I am opening this issue to clarify whether there are missing preprocessing steps or other implementation details.
Steps to Reproduce
-
Dataset
- Training data: phosphorylation_st_train.tsv
- Testing data: phosphorylation_st_test.tsv
-
Preprocessed UniProt database
Only retain the UniProt ID in FASTA headers: -
Model files
-
Training code
from lib.PTModels import train_model
model = load_model('/home/huch/DeepMVP/DeepMVP-main/models/phosphorylation_st/model_0.h5')
train_model(
input_data='data/raw/phos_st_training.tsv',
test_file='data/raw/phos_st_testing.tsv',
db='data/raw/DeepMVP/swiss_prot_human_20190214_processed.fasta',
out_dir='./mytrain',
peptide_length=28*2+1,
p_model=None,
model=model,
)Expected behavior
- According to the paper, the model should achieve substantially higher accuracy(>0.9)
Observed behavior
- Training log shows:
5155/5155 [==============================] - ETA: 0s - loss: 0.2907 - accuracy: 0.8761
Epoch 48: val_accuracy did not improve from 0.83340
- This performance is significantly below the reported results, and it will not grow in following epochs.
Additional context / Questions
- Are there additional preprocessing steps for the dataset or FASTA file that are not documented?
- Are there hyperparameters or model initialization details missing in the repository that affect performance?
- Has anyone successfully reproduced the reported results? If so, could you share your training logs or parameters?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels