A professional-grade gene expression analysis toolkit implementing state-of-the-art statistical methods and machine learning algorithms, optimized for high-performance computing environments.
- Features
- Project Structure
- Technical Overview
- Installation
- Quick Start
- Usage Example
- Configuration
- Changelog
- Roadmap
- Contributing
- License
- Citation
- Contact
- High Performance: GPU-accelerated computations with CUDA support
- Robust Statistics: Implements TMM/RLE/Quantile normalization
- Advanced ML: Multi-algorithm clustering with automatic hyperparameter optimization
- Quality Control: Comprehensive metrics and validation tools
- Scalability: Parallel processing and distributed computing support
GeneExpressionAnalysisTool/
βββ Core/
β βββ utils.py # Core statistical and preprocessing utilities
β βββ model_training.py # ML model implementations and training logic
β βββ model_evaluation.py # Evaluation metrics and validation tools
β βββ main.py # Pipeline orchestration and CLI interface
β βββ hyperparams.yaml # Configuration and hyperparameters
-
Statistical Engine (
utils.py)- TMM/RLE/Quantile normalization
- Robust outlier detection
- Quality metrics computation
- Expression data validation
-
Model Training (
model_training.py)- GPU-accelerated PCA implementation
- Multi-algorithm clustering (K-means, DBSCAN, Spectral)
- Automatic hyperparameter optimization
- CUDA support for large-scale computations
-
Model Evaluation (
model_evaluation.py)- Bootstrap-based stability assessment
- GSEA implementation
- Comprehensive clustering metrics
- Permutation testing for PCA
-
Pipeline Orchestration (
main.py)- Parallel processing support
- GPU acceleration
- Progress logging
- Error handling
- Clone the repository:
git clone https://github.com/username/GeneExpressionAnalysisTool.git- Create a virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
.\venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txtfrom Core.main import GeneExpressionAnalysis
# One-line analysis
results = GeneExpressionAnalysis().analyze("data.csv")
# Advanced usage with custom settings
analysis = GeneExpressionAnalysis(
config_path="Core/hyperparams.yaml",
use_gpu=True
)
results = analysis.run_pipeline(
data_path="expression_data.csv",
metadata_path="metadata.csv",
output_dir="results"
)from Core.main import GeneExpressionAnalysis
# Initialize with GPU support
analysis = GeneExpressionAnalysis(
config_path="Core/hyperparams.yaml",
use_gpu=True
)
# Run analysis pipeline
results = analysis.run_pipeline(
data_path="expression_data.csv",
metadata_path="metadata.csv",
output_dir="results"
)The hyperparams.yaml file controls all major parameters:
data_processing:
normalization_method: 'tmm'
missing_value_threshold: 0.3
pca:
min_variance_explained: 0.9
max_components: 50
clustering:
kmeans:
max_clusters: 10
n_init: 10π¨ Maintenance Release
- Fixed numerous inconsistencies
- Better integrated project structure and modules
- Updated changelog with v4.0.0 improvements
- Implemented GPU-accelerated PCA
- Added parallel processing support
- Enhanced clustering algorithms
- Introduced GSEA functionality
- Added comprehensive error handling
- Improved documentation and type hints
- ComBat-based batch effect correction
- Comprehensive QC metrics suite
- Hierarchical clustering with dendrograms
- Enhanced visualization pipeline
- Multiple testing correction (FDR)
- Silhouette score analysis for clustering
- Type hints for all functions
- Comprehensive error handling
- Input validation checks
- Improved PCA visualization
- Enhanced documentation
- Optimized data preprocessing
- Updated dependency requirements
π― Major Release Focus: Complete Framework Overhaul
GeneExpressionAnalysisTool/
βββ tests/ # Comprehensive test suite
βββ examples/ # Example notebooks
βββ docs/ # Detailed documentation
βββ requirements/ # Environment-specific requirements
- Batch effect correction
- Multiple testing correction
- Gene set enrichment analysis
- Quality control metrics
- Differential expression analysis
- Unit tests for all components
- Integration tests
- Test with real RNA-seq datasets
- Add docstrings with biological context
- Create user guide
- Add example workflows
- Document statistical methods
- Implement parallel processing
- Add progress tracking
- Optimize memory usage
- Add checkpointing
- PCA plots
- Heatmaps
- Volcano plots
- Quality control visualizations
We welcome contributions! Please see our Contributing Guidelines for details.
This project is licensed under the MIT License - see the LICENSE file for details.
@software{gene_expression_tool,
author = {Cotta, Yan P.},
title = {Gene Expression Analysis Tool},
version = {3.0.0},
year = {2025},
url = {https://github.com/YanCotta/GeneExpressionAnalysisTool}
}- Author: Yan P. Cotta
- Email: yanpcotta@gmail.com
- GitHub: @YanCotta