This project focuses on Vision-Language Model (VLM) based classification, particularly for medical imaging analysis related to Thyroid-Associated Orbitopathy (TAO). It uses deep learning models like Swin Transformers and foundation models to classify medical scans.
VLM_classification/
├── main_Swin_TAO_CLS.py
├── trainer_TAO_CLS.py
├── dataset/
│ └── mm_tao_cls/
├── model/
│ └── CLS/
├── optimizers/
│ └── segment_anything/
├── pretrained_models/
├── runs/
├── Text-emmbedding-gen/
└── utils/
The main script to start the training and evaluation process for the Swin Transformer-based TAO classification model. It likely handles argument parsing, setting up the dataset, model, and trainer, and initiating the training loop.
This file contains the core training logic. It defines the training loop, validation loop, loss calculation, optimizer steps, and performance metric logging.
This directory contains all datasets and data-related scripts.
dataset/mm_tao_cls/: Holds the specific multi-modal dataset for TAO classification.data_split_with_csv.py: A script to split the mainmm_tao_cls_4_label.csvintotrain.csv,val.csv, andtest.csvfor training, validation, and testing.*.csv&*.json: These files contain metadata, labels, and file lists that define the different data splits (e.g.,train.csv,mm_tao_cls_4_label.json).train_data/,val_data/,test_data/: These folders contain the actual.niiimage files (likely MRI/CT scans) and correspondingmasksfor each data split.
This directory contains the definitions for all neural network models used in the project.
model/CLS/: Contains various classification models.mm_classification_Foundation_model.py: A multi-modal classification model that likely integrates a pre-trained foundation model (e.g., CLIP, BERT).mm_classification_SwinUnter.py: A classification model based on the SwinUNETR architecture.resnet.py: A standard ResNet model implementation for baseline comparisons or as a feature extractor.transformer_decoder.py: A transformer decoder component, possibly for integrating text and vision features.
Contains custom optimizers, learning rate schedulers, and related components.
lr_scheduler.py: Implements learning rate scheduling strategies to adjust the learning rate during training.segment_anything/: Contains code from the "Segment Anything Model" (SAM), which might be used for pre-processing, data augmentation, or as part of a larger, more complex model architecture.
This directory is for storing pre-trained model weights that can be loaded for fine-tuning or inference.
Foundation_model.pth: An example of a pre-trained weight file for a foundation model.
The default output directory where training logs, model checkpoints, and evaluation results are saved for each experiment.
This directory holds pre-computed text embeddings.
TAO_bert_txt_encoding.pth: A file containing text embeddings generated by a BERT model, likely representing the class labels or descriptive text for the TAO dataset.
A collection of helper scripts and utility functions used across the project.
data_utils.py,MM_CLS_TAO_data_utils.py, etc.: These are data loading and pre-processing pipelines. They handle loading.niifiles, applying transformations, and preparing batches for the model. Different versions exist for different datasets or experiments (TAO, Brain, Liver).loss.py,Focal_Loss.py: Implementations of various loss functions used for training the classification models.utils.py: General utility functions, such as setting up loggers, saving checkpoints, or calculating metrics.