Skip to content

Model performance substantially lower than reported in the paper #7

@chenghui03

Description

@chenghui03

Description

I attempted to train the DeepMVP phosphorylation site prediction model, but the performance is far below the results reported in the original paper. I am opening this issue to clarify whether there are missing preprocessing steps or other implementation details.

Steps to Reproduce

  1. Dataset

  2. Preprocessed UniProt database
    Only retain the UniProt ID in FASTA headers:

  3. Model files

  4. Training code

from lib.PTModels import train_model

model = load_model('/home/huch/DeepMVP/DeepMVP-main/models/phosphorylation_st/model_0.h5')

train_model(
    input_data='data/raw/phos_st_training.tsv',
    test_file='data/raw/phos_st_testing.tsv',
    db='data/raw/DeepMVP/swiss_prot_human_20190214_processed.fasta',
    out_dir='./mytrain',
    peptide_length=28*2+1,
    p_model=None,
    model=model,
)

Expected behavior

  • According to the paper, the model should achieve substantially higher accuracy(>0.9)

Observed behavior

  • Training log shows:
5155/5155 [==============================] - ETA: 0s - loss: 0.2907 - accuracy: 0.8761
Epoch 48: val_accuracy did not improve from 0.83340
  • This performance is significantly below the reported results, and it will not grow in following epochs.

Additional context / Questions

  • Are there additional preprocessing steps for the dataset or FASTA file that are not documented?
  • Are there hyperparameters or model initialization details missing in the repository that affect performance?
  • Has anyone successfully reproduced the reported results? If so, could you share your training logs or parameters?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions