This repository implements self-supervised contrastive learning for 12-lead ECG signals using various augmentation strategies and encoder architectures. After pretraining on massive unlabeled ECG data, the learned encoder is fine-tuned for a downstream binary classification task.
- Self-supervised contrastive learning is performed using positive pairs from augmented views of the same ECG signal.
- Training is done on a large unlabeled ECG dataset using NT-Xent (or similar) loss to bring representations of similar signals closer and dissimilar ones apart.
Defined in CL_augmentations.py, these augmentations provide diverse views of the same ECG signal:
- Time Wrapping: Alternating segments of the ECG are stretched or compressed to simulate temporal warping.
- Permutation: ECG signals are split into
msegments and randomly shuffled. - Zero Masking: Consecutive portions of the ECG are set to zero.
- Dropout Masking: Randomly zeros out 10% of signal values per lead in each batch.
- Gaussian Noise: Adds noise scaled to signal magnitude for robustness.
- CLOCKS Augmentation: Implements spatial, temporal, and patient-level contrast based on CLOCS (Kiyasseh et al., ICML 2021).
Implemented in models.py, multiple encoders are supported to extract meaningful ECG representations:
- CNN – Temporal filters for local pattern learning.
- CNN-LSTM – Combines convolution with temporal memory.
- CNN-Attention-LSTM – Adds attention over LSTM outputs.
- CNN-Transformer – Combines convolutional front-end with self-attention layers.
After pretraining:
- A classifier is added on top of the pretrained encoder.
- The full model is fine-tuned end-to-end using a limited labeled ECG dataset.
Implemented in train.py, all experiments follow a repeated random sub-sampling protocol:
- Randomly split patients into train, validation, and test sets.
- Train the model on the training set.
- Use validation performance to select the best checkpoint.
- Evaluate on the held-out test set.
- Repeat the full process K times with different seeds.
- Report mean ± confidence interval for test performance.