A state-of-the-art protein structure prediction system implementing advanced deep learning architectures inspired by AlphaFold.
- Advanced MSA (Multiple Sequence Alignment) processing
- Pair representation handling
- Multi-head attention mechanisms
- Iterative refinement pipeline
- 3D coordinate prediction
- Backbone and side chain refinement
- Atomic structure assembly
- Iterative structure optimization
- High-performance data pipeline
- Configuration management
- Confidence scoring (pLDDT & TM-score)
OrcaFold/
├── orcafold/
│ ├── __init__.py
│ ├── config/
│ │ ├── __init__.py
│ │ ├── model_config.py
│ │ └── train_config.py
│ ├── modules/
│ │ ├── __init__.py
│ │ ├── evoformer/
│ │ │ ├── __init__.py
│ │ │ ├── msa_processor.py
│ │ │ ├── pair_processor.py
│ │ │ └── attention.py
│ │ ├── structure/
│ │ │ ├── __init__.py
│ │ │ ├── backbone.py
│ │ │ ├── sidechain.py
│ │ │ └── recycling.py
│ │ └── confidence/
│ │ ├── __init__.py
│ │ ├── plddt.py
│ │ └── tm_score.py
│ ├── data/
│ │ ├── __init__.py
│ │ ├── pipeline.py
│ │ ├── msa_tools.py
│ │ └── templates.py
│ └── utils/
│ ├── __init__.py
│ ├── geometry.py
│ ├── cuda_utils.py
│ └── visualization.py
├── scripts/
│ ├── train.py
│ ├── predict.py
│ └── evaluate.py
├── tests/
│ ├── __init__.py
│ ├── test_evoformer.py
│ ├── test_structure.py
│ └── test_pipeline.py
├── examples/
│ ├── single_prediction.py
│ ├── batch_processing.py
│ └── visualization.py
├── docs/
│ ├── architecture.md
│ ├── installation.md
│ ├── pipeline.md
│ └── api.md
├── requirements/
│ ├── base.txt
│ ├── dev.txt
│ └── gpu.txt
├── setup.py
├── README.md
└── LICENSE
- MSA Processor: Handles multiple sequence alignment processing
- Pair Processor: Manages residue pair representations
- Attention Mechanisms: Implements various attention patterns
- Row-wise attention
- Column-wise attention
- Triangle multiplication updates
- Backbone Generator: Creates initial backbone trace
- Side Chain Placement: Predicts side chain conformations
- Structure Refinement: Iteratively improves predictions
- Recycling Handler: Manages prediction recycling
- MSA Generation: Interfaces with JackHMMER/HHblits
- Template Search: Finds and processes template structures
- Feature Processing: Prepares features for model input
- Model architecture settings
- Training parameters
- Runtime configurations
- Prediction modes (monomer/multimer)
- pLDDT score calculation
- TM-score estimation
- Per-residue confidence metrics
# Clone the repository
git clone https://github.com/yourusername/OrcaFold.git
cd OrcaFold
# Create conda environment
conda create -n orcafold python=3.8
conda activate orcafold
# Install dependencies
pip install -r requirements/base.txt
pip install -r requirements/gpu.txt # for CUDA support
# Install in development mode
pip install -e .from orcafold import OrcaFold
from orcafold.data import Pipeline
# Initialize pipeline and model
pipeline = Pipeline()
model = OrcaFold(device='cuda')
# Prepare input data
features = pipeline.process_sequence('SEQUENCE.fasta')
# Generate prediction
structure, confidence = model.predict(features)
# Save results
structure.save('predicted_structure.pdb')- CUDA-accelerated computations
- Mixed precision training
- Efficient MSA processing
- Optimized attention mechanisms
Please read CONTRIBUTING.md for details on our code of conduct and the process for submitting pull requests. Bugs: visualizations
This project is licensed under the MIT License - see LICENSE for details.