seDSM

seDSM is a model for the prediction of deleterious synonymous mutations based on selective ensemble scheme.

Figure 1. Experimental flowchart. (A) Base classifiers training. Generate multiple balanced training subsets from the imbalanced benchmark training sets based on random under-sampling methods and then use the balanced training subsets to construct base classifiers with random features selection. There are three different machine algorithm, support vector machine, decision tree and logistic regression, used in this process. (B) Models selection. Calculate diversity measure of each model in the models pool and select the models with better diversity measure for integrating. And finally evaluation the models on the validation data.

Abstract

Although previous studies have suggested that synonymous mutations drive or participate in various complex human diseases, accurately identifying deleterious synonymous mutations from benign ones is still challenge in the field of medical genomics. There are several computational tools that were developed to predict the harmfulness of synonymous mutations currently. However, most of these computational tools were built based on a balanced training sets with ignoring abundant negative samples that may lead to deficient performance. In this study, we proposed a novel model for prediction of deleterious synonymous mutations named seDSM, which made full used of the abundant negative samples through selective ensemble scheme based on pairwise diversity. First of all, we built models pool containing large number of candidate classifiers for ensemble based on balanced training subsets that were randomly sampled from the imbalanced training sets. Secondly, we selected a number of base classifiers from models pool based on pairwise diversity measures and integrated the models by soft voting. Finally, we constructed seDSM and compared the performance with other tools. On the two independent test sets, seDSM surpasses other tools this field on multiple evaluation indicators, suggesting its significant outstanding predictive performance for deleterious synonymous mutations. We hope that our model could contribute to the further study of deleterious synonymous mutations predicting.

Installation

Install Python 3.9 in Linux and Windows.
Because the program is written in Python 3.9, python 3.9 with the pip tool must be installed first.
seDSM uses the following dependencies: numpy, pandas, sklearn and DESlib。 You can install these packages first, by the following commands:

pip install numpy
pip install pandas
pip install sklearn
pip install deslib

If you have run above commands in Linux for the first time, you can run the following command:

sudo apt install python3-pip

After that, users can change the commands into:

pip install numpy
pip install pandas
pip install sklearn
pip install deslib

Running seDSM

open cmd in Windows or terminal in Linux, then cd to the BBBPred-master/codes folder which contains predict.py

To predict general synonymous mutations using our model, run:

python predict.py --input [custom predicting data in csv format] --output [ predicting results in csv format]

Example: python predict.py --input ./example/example.csv --output ./results/results.csv

After entering predict.py, you will enter the data you need to predict and the csv file that stores the predicted results.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
example		example
scripts		scripts
model.py		model.py
prediction.py		prediction.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seDSM

Abstract

Installation

Running seDSM

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

seDSM

Abstract

Installation

Running seDSM

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages