Skip to content

Add RTMaskConformanceTest accuracy gate for convert_rtstruct#315

Open
brianmanderson wants to merge 3 commits into
pyplati:masterfrom
brianmanderson:feat/rtmask-conformance-gate
Open

Add RTMaskConformanceTest accuracy gate for convert_rtstruct#315
brianmanderson wants to merge 3 commits into
pyplati:masterfrom
brianmanderson:feat/rtmask-conformance-gate

Conversation

@brianmanderson
Copy link
Copy Markdown

@brianmanderson brianmanderson commented May 21, 2026

Summary

Adds an opt-in conformance test that gates platipy's RTSTRUCT→mask path
(convert_rtstruct in platipy/dicom/io/rtstruct_to_nifti.py) against
analytical ground truth from brianmanderson/RTMaskConformanceTest.
Pure addition — four new files, no existing file is modified.

The same suite is already wired into DicomRTTool, PyRaDiSe, rt-utils, and a
C# converter. Adding it to platipy brings the rasterizer under the same
accuracy gate and lets numbers be compared apples-to-apples across tools.

Motivation

convert_rtstruct is platipy's only RTSTRUCT→mask path. Today it has two
tests:

Neither catches an accuracy regression in the rasterizer itself.
RTMaskConformanceTest generates synthetic CT + RTSTRUCT fixtures whose
per-ROI NIfTI ground truth is computed by sub-voxel quadrature against
closed-form shape definitions (sphere, cube, cylinder, ellipsoid, torus,
hollow sphere, hollow cylinder). A Dice / surface-DSC / HD95 / MSD failure
there is a real accuracy regression, not a discretization artifact.

Together with #314, this closes the loop: that PR fixes one specific
real-world bug; this PR provides the standing accuracy gate that would
have caught the next such regression automatically.

Scope

Four new files. No existing files change.

File Purpose
requirements-conformance.txt git-URL dep for rtmask-conformance. Kept out of pyproject.toml because PyPI metadata forbids PEP 508 direct URLs — the published wheel is unaffected.
platipy/dicom/tests/test_conformance.py Opt-in pytest module. Uses pytest.importorskip("rtmask_conformance"), so default poetry run pytest skips it cleanly. Generates the fixture, runs convert_rtstruct(prefix=""), parametrizes one test per ROI against evaluate_one. CT directory and RTSTRUCT are located by DICOM Modality tag so we don't couple to fixture layout.
platipy/dicom/tests/conformance.yaml Threshold overrides. Currently only cube is relaxed; see below.
.github/workflows/conformance.yml Separate workflow on Python 3.12 (rtmask-conformance requires ≥3.10; existing matrix tops out at 3.10). Surfaces as its own PR check. Installs via plain pip into a fresh venv so it doesn't touch the existing poetry-managed Build workflow.

Threshold calibration

Six of the seven ROIs pass against the published defaults out of the box.
The cube ROI is relaxed from dice ≥ 0.99 / vol_err ≤ 0.03 to
dice ≥ 0.98 / vol_err ≤ 0.04 and the relaxation is documented inline with
measured numbers. Root cause: skimage.draw.polygon is boundary-inclusive
(every voxel touched by the polygon is filled), so a 60 mm axis-aligned
cube on 1 mm voxels picks up ~3.4% volume across its six faces. Surface
metrics (Surface DSC 0.9986, HD95 1.0 mm, MSD 0.33 mm) confirm the
geometry is right; only the volume convention is off. This is the same
artifact DicomRTTool exhibits with cv2.fillPoly and has the same
relaxation values. The yaml comment records the path back to the default
if transform_point_set_from_dicom_struct ever adopts a half-voxel-shrink
contour or half-open boundary convention.

What this does NOT change

  • pyproject.toml, poetry.lock, or any runtime dependency
  • The published wheel (the git-URL dep stays in a separate requirements file)
  • The existing Build workflow or its 3.8 / 3.9 / 3.10 matrix
  • Any production code path — only tests and CI config are added
  • Default pytest behaviour: the new module skips cleanly when
    rtmask-conformance isn't installed

How to verify locally

poetry install --with dev
poetry run pip install -r requirements-conformance.txt
poetry run pytest platipy/dicom/tests/test_conformance.py -v

Expected: 7 PASS lines (one per ROI). Any future failure prints the
violated metric, measured value, and threshold.

Notes for reviewers

  • The conformance dep is installed from a git URL today. If
    rtmask-conformance is later published to PyPI, the dep could move into
    pyproject.toml as a dev extra and requirements-conformance.txt
    could be retired.
  • The 3.12 conformance job is intentionally separate from the existing
    Build matrix so accuracy regressions land as their own PR check —
    easy to spot, easy to gate independently of the unit-test suite.
  • All threshold relaxations in conformance.yaml should carry a comment
    explaining what was measured and the path back to the default; this keeps
    the gate honest as the rasterizer evolves.

Relationship to #314

This PR is independent of #314 and can be reviewed in parallel. The
conformance fixture uses uniform Z spacing, so the gate would pass on
master with or without #314 today. The cross-reference above is about
the kind of regression each PR addresses, not a merge-order dependency.

Test plan

  • Conformance workflow green (Python 3.12) on this PR
  • Build workflow green (Python 3.8 / 3.9 / 3.10) on this PR — the
    new test module skips cleanly when rtmask-conformance isn't
    installed, so the existing matrix is unaffected

🤖 Generated with Claude Code

brianmanderson and others added 3 commits May 21, 2026 10:10
Gates platipy's RTSTRUCT->mask path (convert_rtstruct in
platipy/dicom/io/rtstruct_to_nifti.py) against analytical ground truth
from brianmanderson/RTMaskConformanceTest. Today that path is covered
only by a real-world LCTSC fixture (rasterizer-derived reference, so
it can't detect rasterizer drift) and a focused non-uniform-Z test;
neither catches accuracy regressions in the rasterizer itself.

The conformance suite generates synthetic CT+RTSTRUCT fixtures whose
per-ROI NIfTI ground truth is computed by sub-voxel quadrature against
closed-form shape definitions (sphere, cube, cylinder, ellipsoid, torus,
hollow sphere, hollow cylinder). A Dice/HD95/MSD failure there is a real
accuracy regression, not a discretization artifact. Same gate is already
running on DicomRTTool, PyRaDiSe, rt-utils, and a C# converter -- adding
it here makes cross-converter comparisons meaningful.

Integration mirrors DicomRTTool's "PyPI-published" pattern:

- requirements-conformance.txt: git-URL dep kept out of pyproject.toml
  (PyPI metadata forbids PEP 508 direct URLs), so the published wheel
  is unaffected.
- platipy/dicom/tests/test_conformance.py: opt-in pytest module
  (pytest.importorskip) that generates the fixture, runs convert_rtstruct
  with prefix="" so output filenames match the verifier's expectations,
  and parametrizes one test per ROI. CT directory and RTSTRUCT are
  located by DICOM Modality tag so we don't couple to fixture layout.
- platipy/dicom/tests/conformance.yaml: empty (schema_version: 1 only);
  any threshold relaxations will be added after the first CI run, each
  documented with what was measured and the path back to defaults.
- .github/workflows/conformance.yml: separate workflow on Python 3.12
  (rtmask-conformance requires >=3.10; existing matrix tops out at 3.10),
  surfaces as its own PR check. Installs via poetry then pip into the
  same venv for the git-URL dep.

Default pytest runs are unaffected -- the module skips cleanly when
rtmask-conformance isn't installed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
poetry.lock pins pyzmq 24.0.1 (transitive via jupyterlab/ipykernel in
the dev group), which fails to compile on Python 3.12 + gcc 13:

  /usr/include/c++/13/bits/alloc_traits.h:70:31: error: static assertion
    failed: allocator_traits<A>::rebind_alloc<A::value_type> must be A
  error: command '/usr/bin/g++' failed with exit code 1

The conformance job doesn't need any of the notebook tooling -- only
platipy, pytest, and rtmask-conformance. Use plain pip (poetry-core's
PEP 517 backend handles `pip install -e .` cleanly) so we don't drag
in the dev group at all. Faster too.

The main Build matrix is unaffected; it runs on Python 3.8/3.9/3.10
where the lockfile is compatible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First CI run (26238950416) flagged cube only: dice=0.9835 < 0.99 and
volume_rel_err=0.0336 > 0.03. Surface metrics are clean (sDSC=0.9986,
HD95=1.0 mm, MSD=0.33 mm) -- the geometry is right, the volume gap is
the half-voxel boundary-inclusive convention of skimage.draw.polygon
(every voxel touched by the polygon is filled). On a 60 mm axis-aligned
cube at 1 mm voxels this adds ~3.4% volume across the six faces.

Same artifact DicomRTTool exhibits with cv2.fillPoly, with the same
relaxation values. Documented inline with measured numbers and the path
back to the default (half-voxel-shrink contour or half-open boundary
convention in transform_point_set_from_dicom_struct).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant