AFPK_finder

fast identify Animal FAS Like PKS (AFPK)s. It utilizes the t-Distributed Stochastic Neighbor Embedding (t-SNE) technique for dimensionality reduction, which is based on the alignment scores to a series of ketosynthase related Hidden Markov Model (HMM). The HDBSCAN clustering method is then employed to identify the possible AFPK clusters for each protein sequence input.

Requirements

Bash
R >= 3.2
R packages: ggplot2, Rtsne, getopt, dbscan
To install R packages just run R environment and execute command: install.packages(c("ggplot2","Rtsne","getopt","dbscan"))
HMMER 3.3.2
prodigal (if input is DNA file)

AFPK_finder was tested on MacOs 11, Ubuntu 16.04 and Ubuntu 18.04.

How to use

no installation needed

At first, add the folder with executables to your PATH variable.

git clone https://github.com/linzhenjian/AFPK_finder.git

cd AFPK_finder

chmod 771 *

export PATH="/path_to_AFPK_FInder/":$PATH

then run "run_AFPK_finder.sh -p testing_data/ar-clade1.fa -o output -t 16 " or "run_AFPK_finder.sh -d DNA_KS.fa -o output -t 16 "

running time for a input with 100 seqence is about 5 min in a normal desktop computer

input:

ideal inputs will be intact protein sequences of KS domain of iterative PKSs, they can be generated by antiSMASH, interpro..., keeping the input sequence number <100 will give better resolution in output ploting figure. we did find that truncated KS sequence input will cause misidentificaton.

results:

this plot shows the INPUT_KSs (purple) are adjacent to AFPKs (light blue), so the INPUT_KSs are identified as AFPKs.

output table example: it tells that INPUT_KS1 is clustered in cluster 4, cluster 4 contains 100% of the training AFPKs

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
hmm		hmm
test_output		test_output
testing_data		testing_data
LICENSE		LICENSE
README.md		README.md
combine_table.r		combine_table.r
hmmsearch		hmmsearch
hmmsearch_mac		hmmsearch_mac
make_kmer_HMM.sh		make_kmer_HMM.sh
make_train.r		make_train.r
prodigal		prodigal
run_AFLP_finder.sh		run_AFLP_finder.sh
script.r		script.r
test_run.sh		test_run.sh
train_ks.tab		train_ks.tab
training_KS.fa		training_KS.fa
training_data.txt		training_data.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AFPK_finder

Requirements

How to use

input:

results:

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AFPK_finder

Requirements

How to use

input:

results:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages