feat(losses): add optional fused_lncc LNCC backend (loss_type='fused_lncc') by minsuk00 · Pull Request #104 · rohitrango/FireANTs

minsuk00 · 2026-06-26T03:05:11Z

Add an optional `fused_lncc` LNCC backend

Wires up fused_lncc as an optional LNCC backend,
selectable via loss_type='fused_lncc'. It is a standalone fused rectangular-LNCC CUDA kernel;
this PR plugs it in as an opt-in dependency that falls back to cc when not installed, so existing
behavior is unchanged.

Changes

fireants/losses/fused_lncc_backend.py — FusedLNCCLoss, a thin adapter matching the
FusedLocalNormalizedCrossCorrelationLoss interface (constructor args, forward, and the
multiscale hooks). Calls the kernel; no new math.
fireants/registration/abstract.py — a loss_type == 'fused_lncc' branch mirroring the
fusedcc branch (optional import + fallback to cc).
pyproject.toml — declares fused_lncc as an optional dependency. (As a CUDA extension it must be
installed with pip install fused_lncc --no-build-isolation against a matching PyTorch; see its README.)
tests/test_fused_lncc_backend.py — auto-skips without CUDA or the package.

Behavior

Returns -mean(ncc), the same convention as fusedcc. Uses the exact gradient (matches
FusedLocalNormalizedCrossCorrelationLoss(use_ants_gradient=False)), rescaled so the same optimizer
hyperparameters converge identically. Scope is deliberately narrow: rectangular only, k ∈ {3,5,7,9},
mean reduction, pred-only gradient, single GPU. Other configurations (gaussian, masking, sum/none
reduction, symmetric/SyN gradients, grid-parallel sharding) raise a clear error pointing to fusedcc.
Symmetric (dual-image / SyN) gradient support is a straightforward follow-up: it runs the kernel a
second time with pred and target swapped (no CUDA change), at roughly 2x the backward cost. Left out
of this PR for now.

Performance (A40)

The loss kernel is ~3.4× faster / ~3× lighter than fusedcc. End-to-end registration is ~1.1–1.3×
per iteration (the loss is ~38% of a fused step) and ~2.7× against the non-fused cc path, at equal
VRAM and equal registration quality.

Tests

pytest tests/test_fused_lncc_backend.py passes on an A40: forward and gradient parity vs fusedcc
(k=3/5/7/9), sign/range, batched gradient scale, gradient routing, scope guards, multiscale hooks,
and an end-to-end registration through the dispatcher.

…lncc')

feat(losses): add optional fused_lncc LNCC backend (loss_type='fused_…

90e36b6

…lncc')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(losses): add optional fused_lncc LNCC backend (loss_type='fused_lncc')#104

feat(losses): add optional fused_lncc LNCC backend (loss_type='fused_lncc')#104
minsuk00 wants to merge 1 commit into
rohitrango:mainfrom
minsuk00:fused-lncc-backend

minsuk00 commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

minsuk00 commented Jun 26, 2026

Add an optional fused_lncc LNCC backend

Changes

Behavior

Performance (A40)

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add an optional `fused_lncc` LNCC backend