Skip to content

JhuangLab/BMHICML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

BMHICML: Bone Marrow Histopathology Image Classification via Machine Learning

License: MIT R-4.3.0 Python 3.9+ Nextflow Snakemake PyTorch Version

Repository: https://github.com/JhuangLab/BMHICML

Principal Investigator: Jhuanglab

Contact: hiekeen $$at$$ gmail.com

Project Overview

BMHICML (Bone Marrow Histopathology Image Classification via Machine Learning) is an open-source project dedicated to developing and benchmarking machine learning models for automatic classification of bone marrow histopathology images. It addresses the clinical challenge of labor-intensive manual diagnosis by providing efficient, scalable, and interpretable algorithms that assist pathologists in identifying bone marrow disorders.

The project integrates a curated dataset, state-of-the-art classification models, and evaluation toolkits to support both clinical research and algorithm innovation in hematopathology.

Key Features

  • Multi-Category Classification: Supports classification of common bone marrow disorders (e.g., acute myeloid leukemia, myelodysplastic syndrome, normal marrow) with high granularity.

  • Model Zoo: Implements classic and cutting-edge ML/DL models, including CNN-based (ResNet, DenseNet, EfficientNet), transformer-based (ViT, Swin Transformer), and traditional ML (SVM, Random Forest) for comparison.

  • Data Preprocessing Pipeline: Provides built-in functions for image normalization, augmentation (rotation, flipping, zooming), patch extraction, and stain normalization to handle histopathology image variability.

  • Comprehensive Evaluation: Computes key metrics (accuracy, precision, recall, F1-score, AUC-ROC) and generates confusion matrices, classification reports, and ROC curves for result analysis.

  • Interpretability Tools: Integrates Grad-CAM and LIME to visualize model attention regions, helping pathologists validate model decisions.

  • Easy Deployment: Supports model export to ONNX format and provides a lightweight inference script for clinical application.

Dataset

Dataset Description

The project uses a combined dataset of bone marrow histopathology images from public sources and clinical collaborations, including:

  • Public Dataset: Kaggle Bone Marrow Classification Dataset, TCIA Hematological Malignancy Collection

  • Clinical Dataset: De-identified images from [Collaborating Hospital/Institution] (compliant with HIPAA and IRB regulations)

Total Samples: ~10,000 images (8,000 for training, 1,000 for validation, 1,000 for testing)

Image Specifications: 2048×2048 pixels, H&E stained, 3-channel RGB

Classification Categories: 8 types (Normal, Acute Myeloid Leukemia (AML), Myelodysplastic Syndromes (MDS), Chronic Myeloid Leukemia (CML), Multiple Myeloma (MM), Lymphoblastic Leukemia, Aplastic Anemia, Myelofibrosis)

Installation

Prerequisites

  • Python 3.8, 3.9, or 3.10

  • PyTorch 1.10+ (with CUDA support recommended)

  • CUDA 11.3+ (for GPU acceleration)

Clone the Repository

git clone https://github.com/[JhuangLab]/BMHICML.git

cd BMHICML

Install Dependencies

Install via pip:

pip install -r requirements.txt

requirements.txt includes:

torch==1.13.1 torchvision==0.14.1 numpy==1.24.3 pandas==2.0.2 matplotlib==3.7.1

scikit-learn==1.2.2 opencv-python==4.7.0.72 pillow==9.5.0 tqdm==4.65.0

lime==0.2.0.1 grad-cam==1.4.6 onnx==1.14.0 onnxruntime==1.15.1 seaborn==0.12.2

Quick Start

  1. Data Preparation

Organize your dataset into the data/ directory as per the Dataset Structure section.

  1. Model Training

Run the training script with default parameters (uses EfficientNet-B4 as the base model):

python train.py --config configs/efficientnet_b4.yaml

Key training parameters (modify via config file or command line):

--model: Model name (e.g., resnet50, vit_base_patch16_224) --epochs: Number of training epochs (default: 50) --batch_size: Batch size (default: 16) --lr: Initial learning rate (default: 1e-4) --data_dir: Path to dataset (default: ./data) --save_dir: Path to save models and logs (default: ./output)

  1. Model Evaluation

Evaluate a trained model on the test set:

python evaluate.py --model_path ./output/best_model.pth --test_dir ./data/test

The evaluation will generate a results/ directory containing:

  • Classification report (precision, recall, F1-score)

  • Confusion matrix plot

  • ROC curves for each category

  • Evaluation metrics CSV file

  1. Inference with Pretrained Models

Use the pretrained model for single image inference:

python infer.py --model_path ./pretrained/efficientnet_b4_best.pth --image_path ./examples/aml_sample.jpg

Sample output:

Image Path: ./examples/aml_sample.jpg Predicted Category: Acute Myeloid Leukemia (AML) Confidence Score: 0.986 Interpretability Map Saved to: ./output/grad_cam_aml_sample.png

Interpretability

To generate interpretability maps (Grad-CAM) for model predictions:

python visualize.py --model_path ./output/best_model.pth --image_path ./examples/mds_sample.jpg --method grad-cam

The output will be a heatmap overlay on the original image, highlighting the regions that the model used to make its classification decision.

Contribution Guidelines

We welcome contributions to improve BMHICML! Please follow these steps:

  1. Fork the repository.

  2. Create a feature branch (git checkout -b feature/your-feature).

  3. Commit your changes (git commit -m 'Add some feature').

  4. Push to the branch (git push origin feature/your-feature).

  5. Open a Pull Request.

Please ensure your code adheres to the project's coding style and includes appropriate tests.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact & Citation

Contact

For questions, issues, or collaboration requests, please contact:

Citation

If you use this project in your research, please cite it as:

@misc{BMHICML2024, author = {[JhuangLab], [Co-Authors]}, title = {BMHICML: Bone Marrow Histopathology Image Classification via Machine Learning}, year = {2024}, publisher = {GitHub}, journal = {GitHub Repository}, howpublished = {\url{https://github.com/[JhuangLab]/BMHICML}}, }

Acknowledgements

  • We thank our collaborator for providing clinical data.

  • We acknowledge the open-source community for the foundation models and tools used in this project.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages