Skip to content

Add CrackSAM support and enable multi-GPU training#46

Merged
eclipsehl merged 17 commits into
mainfrom
support-cracksam
Dec 9, 2025
Merged

Add CrackSAM support and enable multi-GPU training#46
eclipsehl merged 17 commits into
mainfrom
support-cracksam

Conversation

@keyprocedure

@keyprocedure keyprocedure commented Dec 8, 2025

Copy link
Copy Markdown
Collaborator

Summary

This PR integrates CrackSAM support, enables distributed training, and adds associated framework improvements. These changes are bundled into a single PR because of their interdependence and the pace of this sprint.

CrackSAM:

  • Integrates SAM (Segment Anything Model) encoder.
  • Adds custom CrackSAM loss.
  • Adds support for multi-channel model outputs.

Framework Enhancements:

  • Adds PyTorch DDP support enabling multi-GPU training.
  • Adds per-model img_size config to support dynamic dataset resolutions.
  • Corrects loss selection for non-Dice loss configurations.
  • Adds support for the Khanhha dataset.
  • Updates usage documentation.
  • Removes epoch loss printing during training.

Fixes #34
Fixes #43
Contributes to #3, #14, #42

Test Plan

Verified complete training, validation, and test pipeline using:
python -m main -c cracksam.yml --dataset khanhha

Verified multi-GPU training on 4×H100 GPUs with:
torchrun --nproc_per_node=4 main.py -c cracksam.yml --dataset khanhh

@eclipsehl eclipsehl merged commit 37d3856 into main Dec 9, 2025
2 checks passed
@keyprocedure keyprocedure deleted the support-cracksam branch December 30, 2025 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove epoch loss print during training Add model support for CrackSAM

2 participants