Sequence-based methods miss structurally conserved regions with low sequence conservation, so this pipeline applies the convex hull algorithm to identify conserved local structural topologies directly from 3D protein structures. It builds polyhedrons from 30-residue sliding window segments and compares them using Shared Triangles (ST) and Adjacent Shared Triangles (AST) metrics to detect potential functional domains that sequence-based approaches such as PROSITE cannot capture.
Built by Guan-Yu Chen and Yuh-Chyang Charng.
Install all required dependencies with:
pip install -r requirements.txt-
Quick guide:
Use your own PDB files according to the following workflow:
01_extract_residues → 02_convexhull_30aa → 03_compare_convexhull_30aa → 04_summary → 05_rearrange_freq_match
-
Workspace:
All scripts accept
-w/--workspaceto set the root working directory. It defaults to the script's own location. All other paths are resolved relative to it, so in most cases-wis the only argument you need to set.Assume
<workspace>/pdb_files/contains two PDB files:proteinA.pdbandproteinB.pdb. -
Workflow:
-
Place your PDB files in
<workspace>/pdb_files/and run. Outputs one info Excel file per PDB into<workspace>/info/.python 01_extract_residues.py -w /path/to/workspace # or override individual paths: python 01_extract_residues.py -w /path/to/workspace -p pdb_files -o info -
Run once per PDB. Reads the info Excel from step 1 and outputs results into
<workspace>/info/<pdb_name>/.python 02_convexhull_30aa.py -w /path/to/workspace -i info/proteinA_30aa_info.xlsx python 02_convexhull_30aa.py -w /path/to/workspace -i info/proteinB_30aa_info.xlsx # or override individual paths: python 02_convexhull_30aa.py -w /path/to/workspace -i info/proteinA_30aa_info.xlsx -p pdb_files -o infoThis produces
info/proteinA/andinfo/proteinB/. -
Point to the two subfolders generated in step 2 and specify an output folder.
python 03_compare_convexhull_30aa.py -w /path/to/workspace -1 info/proteinA -2 info/proteinB -o output
-
python 04_summary.py -w /path/to/workspace -i output -o output/summary.xlsx
-
python 05_rearrange_freq_match.py -w /path/to/workspace -i output/summary.xlsx -o output