Add RTMaskConformanceTest accuracy gate for convert_rtstruct#315
Open
brianmanderson wants to merge 3 commits into
Open
Add RTMaskConformanceTest accuracy gate for convert_rtstruct#315brianmanderson wants to merge 3 commits into
brianmanderson wants to merge 3 commits into
Conversation
Gates platipy's RTSTRUCT->mask path (convert_rtstruct in platipy/dicom/io/rtstruct_to_nifti.py) against analytical ground truth from brianmanderson/RTMaskConformanceTest. Today that path is covered only by a real-world LCTSC fixture (rasterizer-derived reference, so it can't detect rasterizer drift) and a focused non-uniform-Z test; neither catches accuracy regressions in the rasterizer itself. The conformance suite generates synthetic CT+RTSTRUCT fixtures whose per-ROI NIfTI ground truth is computed by sub-voxel quadrature against closed-form shape definitions (sphere, cube, cylinder, ellipsoid, torus, hollow sphere, hollow cylinder). A Dice/HD95/MSD failure there is a real accuracy regression, not a discretization artifact. Same gate is already running on DicomRTTool, PyRaDiSe, rt-utils, and a C# converter -- adding it here makes cross-converter comparisons meaningful. Integration mirrors DicomRTTool's "PyPI-published" pattern: - requirements-conformance.txt: git-URL dep kept out of pyproject.toml (PyPI metadata forbids PEP 508 direct URLs), so the published wheel is unaffected. - platipy/dicom/tests/test_conformance.py: opt-in pytest module (pytest.importorskip) that generates the fixture, runs convert_rtstruct with prefix="" so output filenames match the verifier's expectations, and parametrizes one test per ROI. CT directory and RTSTRUCT are located by DICOM Modality tag so we don't couple to fixture layout. - platipy/dicom/tests/conformance.yaml: empty (schema_version: 1 only); any threshold relaxations will be added after the first CI run, each documented with what was measured and the path back to defaults. - .github/workflows/conformance.yml: separate workflow on Python 3.12 (rtmask-conformance requires >=3.10; existing matrix tops out at 3.10), surfaces as its own PR check. Installs via poetry then pip into the same venv for the git-URL dep. Default pytest runs are unaffected -- the module skips cleanly when rtmask-conformance isn't installed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
poetry.lock pins pyzmq 24.0.1 (transitive via jupyterlab/ipykernel in
the dev group), which fails to compile on Python 3.12 + gcc 13:
/usr/include/c++/13/bits/alloc_traits.h:70:31: error: static assertion
failed: allocator_traits<A>::rebind_alloc<A::value_type> must be A
error: command '/usr/bin/g++' failed with exit code 1
The conformance job doesn't need any of the notebook tooling -- only
platipy, pytest, and rtmask-conformance. Use plain pip (poetry-core's
PEP 517 backend handles `pip install -e .` cleanly) so we don't drag
in the dev group at all. Faster too.
The main Build matrix is unaffected; it runs on Python 3.8/3.9/3.10
where the lockfile is compatible.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First CI run (26238950416) flagged cube only: dice=0.9835 < 0.99 and volume_rel_err=0.0336 > 0.03. Surface metrics are clean (sDSC=0.9986, HD95=1.0 mm, MSD=0.33 mm) -- the geometry is right, the volume gap is the half-voxel boundary-inclusive convention of skimage.draw.polygon (every voxel touched by the polygon is filled). On a 60 mm axis-aligned cube at 1 mm voxels this adds ~3.4% volume across the six faces. Same artifact DicomRTTool exhibits with cv2.fillPoly, with the same relaxation values. Documented inline with measured numbers and the path back to the default (half-voxel-shrink contour or half-open boundary convention in transform_point_set_from_dicom_struct). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an opt-in conformance test that gates platipy's RTSTRUCT→mask path
(
convert_rtstructinplatipy/dicom/io/rtstruct_to_nifti.py) againstanalytical ground truth from brianmanderson/RTMaskConformanceTest.
Pure addition — four new files, no existing file is modified.
The same suite is already wired into DicomRTTool, PyRaDiSe, rt-utils, and a
C# converter. Adding it to platipy brings the rasterizer under the same
accuracy gate and lets numbers be compared apples-to-apples across tools.
Motivation
convert_rtstructis platipy's only RTSTRUCT→mask path. Today it has twotests:
test_convert.py— real LCTSC data. Good coverage of the I/O path, butthe reference mask is itself produced by a rasterizer, so it can't detect
drift in the rasterizer being tested.
test_nonuniform_z_slice_matching.py(added in Fix non-uniform-Z slice matching in convert_rtstruct via nearest-IPP lookup #314) — focused on therecent nearest-IPP slice-matching fix.
Neither catches an accuracy regression in the rasterizer itself.
RTMaskConformanceTestgenerates synthetic CT + RTSTRUCT fixtures whoseper-ROI NIfTI ground truth is computed by sub-voxel quadrature against
closed-form shape definitions (sphere, cube, cylinder, ellipsoid, torus,
hollow sphere, hollow cylinder). A Dice / surface-DSC / HD95 / MSD failure
there is a real accuracy regression, not a discretization artifact.
Together with #314, this closes the loop: that PR fixes one specific
real-world bug; this PR provides the standing accuracy gate that would
have caught the next such regression automatically.
Scope
Four new files. No existing files change.
requirements-conformance.txtrtmask-conformance. Kept out ofpyproject.tomlbecause PyPI metadata forbids PEP 508 direct URLs — the published wheel is unaffected.platipy/dicom/tests/test_conformance.pypytest.importorskip("rtmask_conformance"), so defaultpoetry run pytestskips it cleanly. Generates the fixture, runsconvert_rtstruct(prefix=""), parametrizes one test per ROI againstevaluate_one. CT directory and RTSTRUCT are located by DICOMModalitytag so we don't couple to fixture layout.platipy/dicom/tests/conformance.yamlcubeis relaxed; see below..github/workflows/conformance.ymlBuildworkflow.Threshold calibration
Six of the seven ROIs pass against the published defaults out of the box.
The cube ROI is relaxed from
dice ≥ 0.99 / vol_err ≤ 0.03todice ≥ 0.98 / vol_err ≤ 0.04and the relaxation is documented inline withmeasured numbers. Root cause:
skimage.draw.polygonis boundary-inclusive(every voxel touched by the polygon is filled), so a 60 mm axis-aligned
cube on 1 mm voxels picks up ~3.4% volume across its six faces. Surface
metrics (Surface DSC 0.9986, HD95 1.0 mm, MSD 0.33 mm) confirm the
geometry is right; only the volume convention is off. This is the same
artifact DicomRTTool exhibits with
cv2.fillPolyand has the samerelaxation values. The yaml comment records the path back to the default
if
transform_point_set_from_dicom_structever adopts a half-voxel-shrinkcontour or half-open boundary convention.
What this does NOT change
pyproject.toml,poetry.lock, or any runtime dependencyBuildworkflow or its 3.8 / 3.9 / 3.10 matrixpytestbehaviour: the new module skips cleanly whenrtmask-conformanceisn't installedHow to verify locally
Expected: 7 PASS lines (one per ROI). Any future failure prints the
violated metric, measured value, and threshold.
Notes for reviewers
rtmask-conformanceis later published to PyPI, the dep could move intopyproject.tomlas adevextra andrequirements-conformance.txtcould be retired.
Buildmatrix so accuracy regressions land as their own PR check —easy to spot, easy to gate independently of the unit-test suite.
conformance.yamlshould carry a commentexplaining what was measured and the path back to the default; this keeps
the gate honest as the rasterizer evolves.
Relationship to #314
This PR is independent of #314 and can be reviewed in parallel. The
conformance fixture uses uniform Z spacing, so the gate would pass on
master with or without #314 today. The cross-reference above is about
the kind of regression each PR addresses, not a merge-order dependency.
Test plan
Conformanceworkflow green (Python 3.12) on this PRBuildworkflow green (Python 3.8 / 3.9 / 3.10) on this PR — thenew test module skips cleanly when
rtmask-conformanceisn'tinstalled, so the existing matrix is unaffected
🤖 Generated with Claude Code