feat(losses): add optional fused_lncc LNCC backend (loss_type='fused_lncc')#104
Open
minsuk00 wants to merge 1 commit into
Open
feat(losses): add optional fused_lncc LNCC backend (loss_type='fused_lncc')#104minsuk00 wants to merge 1 commit into
minsuk00 wants to merge 1 commit into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an optional
fused_lnccLNCC backendWires up
fused_lnccas an optional LNCC backend,selectable via
loss_type='fused_lncc'. It is a standalone fused rectangular-LNCC CUDA kernel;this PR plugs it in as an opt-in dependency that falls back to
ccwhen not installed, so existingbehavior is unchanged.
Changes
fireants/losses/fused_lncc_backend.py—FusedLNCCLoss, a thin adapter matching theFusedLocalNormalizedCrossCorrelationLossinterface (constructor args,forward, and themultiscale hooks). Calls the kernel; no new math.
fireants/registration/abstract.py— aloss_type == 'fused_lncc'branch mirroring thefusedccbranch (optional import + fallback tocc).pyproject.toml— declaresfused_lnccas an optional dependency. (As a CUDA extension it must beinstalled with
pip install fused_lncc --no-build-isolationagainst a matching PyTorch; see its README.)tests/test_fused_lncc_backend.py— auto-skips without CUDA or the package.Behavior
Returns
-mean(ncc), the same convention asfusedcc. Uses the exact gradient (matchesFusedLocalNormalizedCrossCorrelationLoss(use_ants_gradient=False)), rescaled so the same optimizerhyperparameters converge identically. Scope is deliberately narrow: rectangular only, k ∈ {3,5,7,9},
mean reduction, pred-only gradient, single GPU. Other configurations (gaussian, masking, sum/none
reduction, symmetric/SyN gradients, grid-parallel sharding) raise a clear error pointing to
fusedcc.Symmetric (dual-image / SyN) gradient support is a straightforward follow-up: it runs the kernel a
second time with pred and target swapped (no CUDA change), at roughly 2x the backward cost. Left out
of this PR for now.
Performance (A40)
The loss kernel is ~3.4× faster / ~3× lighter than
fusedcc. End-to-end registration is ~1.1–1.3×per iteration (the loss is ~38% of a fused step) and ~2.7× against the non-fused
ccpath, at equalVRAM and equal registration quality.
Tests
pytest tests/test_fused_lncc_backend.pypasses on an A40: forward and gradient parity vsfusedcc(k=3/5/7/9), sign/range, batched gradient scale, gradient routing, scope guards, multiscale hooks,
and an end-to-end registration through the dispatcher.