A novel self-supervised framework for fetal movement detection from extended ultrasound video recordings
π― Overview Paper
CURL (Contrastive Ultrasound Video Representation Learning) is a cutting-edge self-supervised framework designed specifically for fetal movement assessment from ultrasound videos. Our method employs a dual-contrastive loss that captures both spatial (anatomical) and temporal (motion-based) features, enabling robust representation learning for fetal movement dynamics.
- π Dual-Contrastive Learning: Combines spatial (SimCLR-style NT-Xent) and temporal contrastive objectives
- π― Task-Specific Sampling: Intelligent sampling strategy for movement vs. non-movement segments
- π Flexible Inference: Supports ultrasound recordings of arbitrary length through probabilistic fine-tuning
- ποΈ Modular Architecture: Support for both SlowFast and Vision Transformer (ViT) backbones
Pipeline Overview: Starting from expertly annotated ultrasound videos (A), CURL splits clips into spatiotemporal patches (B), uses transformer backbones with dual-contrastive learning to extract robust features, fine-tunes with lightweight classifiers (C), and delivers clinically reliable fetal movement detection (D).
- Python 3.8+
- CUDA-capable GPU (recommended)
- 16GB+ RAM for video processing
- Clone the repository
git clone https://github.com/Mr-TalhaIlyas/CURL.git
cd CURL- Create virtual environment
# Using conda (recommended)
conda create -n curl python=3.8
conda activate curl
# Or using pip
python -m venv curl
source curl/bin/activate # Linux/Mac
curl\Scripts\activate # Windows- Install dependencies
# Using pip
pip install -r requirements.txt
# Or using conda
conda create --name curl --file requirements.txt- Organize your data structure:
data/
βββ videos/ # Raw ultrasound videos (.mp4)
βββ optical_flow/ # Optical flow videos (.mp4)
βββ labels/ # Label files (.npy)
βββ folds/ # Train/test split files
βββ train_fold_1.txt
βββ test_fold_1.txt
βββ ...
- Update configuration:
# In configs/config.py
config = dict(
vid_dir = "path/to/videos/",
flow_dir = "path/to/optical_flow/",
lbl_dir = "path/to/labels/",
folds = "path/to/folds/"
)Choose between two backbone architectures:
# Train with both spatial and temporal losses
python dual_contrastive_main.py \
--enable_temporal_loss \
--spatial_loss_weight 1.0 \
--temporal_loss_weight 0.5 \
--dual_loss_mode both \
--epochs 100
# Spatial-only training
python dual_contrastive_main.py \
--spatial_loss_weight 1.0 \
--dual_loss_mode spatial_only# MAE-style contrastive learning
python run_mae_contrastive.py \
--enable_temporal_loss \
--spatial_loss_weight 1.0 \
--temporal_loss_weight 0.7 \
--embed_dim 1024 \
--depth 24# Fine-tune pre-trained contrastive model
python run_finetune.py \
--model_type contrastive_mae \
--checkpoint_path /path/to/pretrained_model.pth \
--epochs 30 \
--lr 2e-4 \
--loss_type focal
# Fine-tune standard MAE model
python run_finetune.py \
--model_type standard_mae \
--checkpoint_path /path/to/mae_model.pth \
--epochs 30# Spatial Contrastive Loss (NT-Xent)
spatial_loss = NT_XentLoss(spatial_features_i, spatial_features_j)
# Temporal Contrastive Loss (TC)
temporal_loss = temporal_contrastive_loss(
temporal_features_i,
temporal_features_j,
temperature,
clusters=8
)
# Combined Loss
total_loss = Ξ± * spatial_loss + Ξ² * temporal_loss| Model | Backbone | Key Features |
|---|---|---|
| SimCLR + SlowFast | SlowFast ResNet | Two-stream processing for spatial-temporal features |
| Contrastive MAE | Vision Transformer | Patch-based processing with attention mechanisms |
| Hybrid Models | Custom | Combine benefits of both approaches |
# Dual contrastive learning
enable_temporal_loss = True
spatial_loss_weight = 1.0
temporal_loss_weight = 0.5
temperature_spatial = 0.5
temperature_temporal = 0.1
# Temporal contrastive loss
tc_clusters = 8
tc_num_iters = 10
tc_do_entro = True # Enable IID regularization
# Model architecture
mae_contrastive = dict(
embed_dim = 1024,
depth = 24,
num_heads = 16,
projection_dim = 256,
temporal_projection_dim = 128
)CURL/
βββ π README.md
βββ π requirements.txt
βββ π scripts/
β βββ π§ configs/
β β βββ config.py
β βββ π data/
β β βββ simclr_loader.py
β β βββ dataloader.py
β β βββ utils.py
β βββ ποΈ models/
β β βββ mae/
β β βββ slowfast/
β β βββ contrastive_mae.py
β βββ π οΈ tools/
β β βββ nt_xnet.py # Spatial contrastive loss
β β βββ tc_loss.py # Temporal contrastive loss
β β βββ simclr_training.py
β βββ π Training Scripts
β β βββ main_simclr.py
β β βββ main_mae_contrastive.py
β β βββ finetune_contrastive_mae.py
β βββ π Run Scripts
β βββ dual_contrastive_main.py
β βββ run_mae_contrastive.py
β βββ run_finetune.py
βββ πΈ screens/
βββ summary.jpg
- Based on SimCLR framework
- Learns anatomical feature representations
- Temperature-scaled InfoNCE loss
- Novel clustering-based approach
- Learns motion dynamics
- Combines Cross-Level Distillation (CLD) and IID regularization
- Spatial: Random cropping, color jittering, Gaussian blur
- Temporal: Frame dropping, temporal jittering
- Domain-specific: Ultrasound-aware transformations
If you find this work useful, please cite our paper:
Paper is currently under review.Found a bug? Please open an issue with:
- Detailed description
- Steps to reproduce
- Environment details
- Expected vs actual behavior
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to the medical imaging community for inspiration
- Built upon excellent work in self-supervised learning
- Special thanks to SimCLR and MAE teams
