Skip to content

Unable to reproduce the prediction score for T0967 #1

@yutake27

Description

@yutake27

Content

I re-generated the features by myself and made predictions for T0967 included in the Example.

After that, I compared the predicted score in this repository (data/CASP13_stage2/T0967.pkl.bz2) with the newly predicted score, and there was a significant difference.

Could you tell me how I can reproduce the prediction score in this repository?

The following is a detailed description of the situation.

How to generate input features

I try to generate features in two different ways.

  1. generate using RaptorX-3DModeling (https://github.com/j3xugit/RaptorX-3DModeling)
  2. generated using RaptorX web server (http://raptorx.uchicago.edu/ContactMap/)

input sequence

The sequence of T0967 obtained from CASP13 website

T0967 MamB, Magnetosome protein , [Candidatus Desulfamplus magnetomortis BW-1] , 81 residues|
EDYIEAIANVLEKTPSISDVKDIIARELGQVLEFEIDLYVPPDITVTTGERIKKEVNQIIKEIVDRKSTVKVRLFAAQEEL

How to generate features using RaptorX-3DModeling

Set up

  • Using the same database as the paper (Uniclust30 Oct, 2017)

  • EVcouplings and Metagenome database were not used.

Execution command

$ cd RaptorX-3DModeling 
$ ./Server/RaptorXFolder.sh -n 0 -o output_dir T0967.fasta 

Features used as input for ResNetQA from the generated files

  • sequence feature

    T0967_OUT/T0967_contact/feat_T0967_uce5/T0967.inputFeatures.pkl

  • distance potential

    T0967_OUT/DistancePred/T0967.pairPotential.DFIRE16.pkl

How to generate features using RaptorX webserver

  1. Input a sequence to the webserver (http://raptorx.uchicago.edu/ContactMap/)
  2. Download and retrieve the results when prediction is done (JOB_ID.all_IN_one)

Features used as input for ResNetQA from the generated files

  • sequence feature

    JOB_ID.all_in_one/JOB_ID.inputFeatures.pkl

  • distance potential

    JOB_ID.all_in_one/JOB_ID.pairPotential.pkl

Running ResNetQA

Execution command

$ cd ResNetQA/main
$ python ResNetQA.py T0967.inputFeatures.pkl T0967.pairPotential.DFIRE16.pkl ../examples/T0967_stage2/ ../examples/T0967_stage2.QA.pkl GDTTS

The input model structures were obtained from the CASP13 download page. (https://predictioncenter.org/download_area/CASP13/server_predictions/T0967.stage2.3D.srv.tar.gz)

Comparison of prediction results

Compare the following four scores

  1. Original score

    data/CASP13_stage2/T0967.pkl.bz2

  2. Predicted score using features included in the repository

    (examples/T0967.inputFeatures.pkl and examples/T0967.distPotential.DFiRE16.pkl)

  3. Predicted score using features generated using RaptorX-3DModeling

  4. Predicted score using features generated using webserver

The following table shows a part of the predicted scores.

model structure original score features included in repository features by RaptorX-3DModeling features by RaptorX webserver
YASARA_TS3 0.830 0.8297 0.4552 0.4219
Zhang-CEthreader_TS5 0.753 0.7533 0.4619 0.4020
QUARK_TS5 0.657 0.6573 0.3995 0.3853
RBO-Aleph_TS4 0.839 0.8393 0.4668 0.4255
Distill_TS3 0.828 0.8280 0.4456 0.4092

The original score and the predicted score using the features in the repository were consistent.
Therefore, it was found that there was no problem in the behavior of ResNetQA.

However, when the features were re-generated using RaptorX-3DModeling or RaptorX web server, they differed significantly from the original scores.

Question

Could you tell me how to reproduce the original score?
Also, what kind of environment (database, etc.) did you use when you generated the features?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions