Skip to content

YanCotta/GeneExpressionAnalysisTool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Gene Expression Analysis Tool 🧬

Python 3.8+ License: MIT Code Style: Black Documentation Status Testing Status

A professional-grade gene expression analysis toolkit implementing state-of-the-art statistical methods and machine learning algorithms, optimized for high-performance computing environments.

πŸ“š Table of Contents

✨ Features

  • High Performance: GPU-accelerated computations with CUDA support
  • Robust Statistics: Implements TMM/RLE/Quantile normalization
  • Advanced ML: Multi-algorithm clustering with automatic hyperparameter optimization
  • Quality Control: Comprehensive metrics and validation tools
  • Scalability: Parallel processing and distributed computing support

πŸ“ Project Structure

GeneExpressionAnalysisTool/
β”œβ”€β”€ Core/
β”‚   β”œβ”€β”€ utils.py           # Core statistical and preprocessing utilities
β”‚   β”œβ”€β”€ model_training.py  # ML model implementations and training logic
β”‚   β”œβ”€β”€ model_evaluation.py # Evaluation metrics and validation tools
β”‚   β”œβ”€β”€ main.py           # Pipeline orchestration and CLI interface
β”‚   └── hyperparams.yaml  # Configuration and hyperparameters

Technical Overview

Core Components

  1. Statistical Engine (utils.py)

    • TMM/RLE/Quantile normalization
    • Robust outlier detection
    • Quality metrics computation
    • Expression data validation
  2. Model Training (model_training.py)

    • GPU-accelerated PCA implementation
    • Multi-algorithm clustering (K-means, DBSCAN, Spectral)
    • Automatic hyperparameter optimization
    • CUDA support for large-scale computations
  3. Model Evaluation (model_evaluation.py)

    • Bootstrap-based stability assessment
    • GSEA implementation
    • Comprehensive clustering metrics
    • Permutation testing for PCA
  4. Pipeline Orchestration (main.py)

    • Parallel processing support
    • GPU acceleration
    • Progress logging
    • Error handling

πŸš€ Installation

  1. Clone the repository:
git clone https://github.com/username/GeneExpressionAnalysisTool.git
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
.\venv\Scripts\activate  # Windows
  1. Install dependencies:
pip install -r requirements.txt

⚑ Quick Start

from Core.main import GeneExpressionAnalysis

# One-line analysis
results = GeneExpressionAnalysis().analyze("data.csv")

# Advanced usage with custom settings
analysis = GeneExpressionAnalysis(
    config_path="Core/hyperparams.yaml",
    use_gpu=True
)
results = analysis.run_pipeline(
    data_path="expression_data.csv",
    metadata_path="metadata.csv",
    output_dir="results"
)

Usage Example

from Core.main import GeneExpressionAnalysis

# Initialize with GPU support
analysis = GeneExpressionAnalysis(
    config_path="Core/hyperparams.yaml",
    use_gpu=True
)

# Run analysis pipeline
results = analysis.run_pipeline(
    data_path="expression_data.csv",
    metadata_path="metadata.csv",
    output_dir="results"
)

Configuration

The hyperparams.yaml file controls all major parameters:

data_processing:
  normalization_method: 'tmm'
  missing_value_threshold: 0.3

pca:
  min_variance_explained: 0.9
  max_components: 50

clustering:
  kmeans:
    max_clusters: 10
    n_init: 10

πŸ“‹ Changelog

v3.5.0 (current)

πŸ”¨ Maintenance Release

  • Fixed numerous inconsistencies
  • Better integrated project structure and modules
  • Updated changelog with v4.0.0 improvements

v3.0.0 (01/05)

  • Implemented GPU-accelerated PCA
  • Added parallel processing support
  • Enhanced clustering algorithms
  • Introduced GSEA functionality
  • Added comprehensive error handling
  • Improved documentation and type hints

v2.0.0 (12/2024)

  • ComBat-based batch effect correction
  • Comprehensive QC metrics suite
  • Hierarchical clustering with dendrograms
  • Enhanced visualization pipeline
  • Multiple testing correction (FDR)
  • Silhouette score analysis for clustering
  • Type hints for all functions
  • Comprehensive error handling
  • Input validation checks
  • Improved PCA visualization
  • Enhanced documentation
  • Optimized data preprocessing
  • Updated dependency requirements

πŸ—ΊοΈ Roadmap

v4.0.0 (Planned)

🎯 Major Release Focus: Complete Framework Overhaul

πŸ—οΈ Improve Project Structure

GeneExpressionAnalysisTool/
β”œβ”€β”€ tests/                    # Comprehensive test suite
β”œβ”€β”€ examples/                 # Example notebooks
β”œβ”€β”€ docs/                    # Detailed documentation
└── requirements/            # Environment-specific requirements

Add Essential Bioinformatics Features:

  • Batch effect correction
  • Multiple testing correction
  • Gene set enrichment analysis
  • Quality control metrics
  • Differential expression analysis

Enhance Testing:

  • Unit tests for all components
  • Integration tests
  • Test with real RNA-seq datasets

Improve Documentation:

  • Add docstrings with biological context
  • Create user guide
  • Add example workflows
  • Document statistical methods

Optimize Performance:

  • Implement parallel processing
  • Add progress tracking
  • Optimize memory usage
  • Add checkpointing

Add Visualization:

  • PCA plots
  • Heatmaps
  • Volcano plots
  • Quality control visualizations

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“ Citation

@software{gene_expression_tool,
  author = {Cotta, Yan P.},
  title = {Gene Expression Analysis Tool},
  version = {3.0.0},
  year = {2025},
  url = {https://github.com/YanCotta/GeneExpressionAnalysisTool}
}

πŸ“« Contact


Built with ❀️ using Python 3.8+, PyTorch, and scikit-learn.

About

A comprehensive Python tool for analyzing gene expression data with advanced bioinformatics capabilities.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages