TME-WSI: Deep Learning-based Prediction of Breast Cancer Tumor and Immune Phenotypes from Histopathology
This repository contains the code for predicting Tumor Microenvironment (TME) phenotypes from Whole Slide Images (WSIs) using deep learning, specifically Multi-Instance Learning (MIL) models.
The project is structured as a Python package tme_wsi with supporting scripts:
tme-wsi/
├── tme_wsi/
│ ├── preprocessing/ # WSI Segmentation and Patching
│ ├── features/ # Feature Extraction (CLAM, PLIP)
│ ├── models/ # MIL Models (CLAM, TransMIL)
│ ├── engine/ # Training and Evaluation Engine
│ ├── visualization/ # Attention Heatmap Generation
│ └── utils/ # Datasets and Utilities
├── scripts/
│ ├── run_preprocessing.py # Script to patch WSIs
│ ├── run_feature_extraction.py # Script to extract features
│ ├── train_model.py # Script to train models
│ └── eval_model.py # Script to evaluate models
├── configs/ # Configuration files (YAML)
└── README.md
- Clone the repository.
- Install dependencies:
(Note: Ensure you have
pip install -r requirements.txt
openslide-pythonand appropriate system libraries installed).
Patch WSIs into small images (tiles).
python scripts/run_preprocessing.py \
--source /path/to/wsis \
--save_dir /path/to/output/patches \
--patch_size 256 --step_size 256 --patch_level 0Extract features from patches using CLAM (ResNet50) or PLIP.
python scripts/run_feature_extraction.py \
--source_dir /path/to/wsis \
--h5_dir /path/to/output/patches \
--save_dir /path/to/output/features \
--model_type clam \
--gpu_id 0Train a MIL model to predict phenotypes (e.g., Angiogenesis, Glycolysis).
- Edit
configs/default_config.yamlto point to your data and choosing the label. - Run training:
python scripts/train_model.py \
--config configs/default_config.yaml \
--fold 0 \
--gpu_id 0Evaluate a trained model.
python scripts/eval_model.py \
--config configs/default_config.yaml \
--checkpoint results/experiment_name/fold_0/model_fold_0.pt \
--fold 0The pipeline follows these steps:
- Preprocessing: Automatic segmentation of tissue from background and patching into 256x256 tiles at 20x magnification.
- Feature Extraction:
- CLAM: ResNet50 pre-trained on ImageNet (or custom weights).
- PLIP: Pathology Language-Image Pre-training model.
- MIL Modeling: Aggregating patch features into slide-level predictions using attention-based MIL (CLAM-SB, CLAM-MB) or Transformer-based MIL (TransMIL).
- Prediction: Predicting gene expression signatures (ssGSEA scores) or clinical subtypes.
- Visualization: Generating attention heatmaps to identify morphological regions associated with the phenotypes.
[License Information]