Problem
When the same code is duplicated across N files, the output contains N*(N-1)/2 pairwise findings tagged with a cluster_id, but no consolidated view. Users must mentally reconstruct which functions form a clone group.
For example, if function X is duplicated in files A, B, C, and D, the report shows 6 separate pairwise findings (A-B, A-C, A-D, B-C, B-D, C-D) rather than one group listing all 4 locations.
Desired behavior
- A "clone group" concept that collapses N-way duplicates into a single group listing all locations
- HTML reporter should visually group findings by cluster, showing the group as one expandable section with all N locations rather than separate pairwise entries
- JSON reporter should support a grouped structure (e.g.
"groups": [{"locations": [...], "findings": [...]}])
- Stats should include group-level metrics (e.g. "12 clone groups across 47 functions")
Current behavior
--cluster uses Union-Find to assign a cluster_id in finding metadata
- Reporters don't use
cluster_id for grouping — HTML lists findings flat, JSON/SARIF bury it in metadata
- No group-level summary or metrics
Problem
When the same code is duplicated across N files, the output contains N*(N-1)/2 pairwise findings tagged with a
cluster_id, but no consolidated view. Users must mentally reconstruct which functions form a clone group.For example, if function X is duplicated in files A, B, C, and D, the report shows 6 separate pairwise findings (A-B, A-C, A-D, B-C, B-D, C-D) rather than one group listing all 4 locations.
Desired behavior
"groups": [{"locations": [...], "findings": [...]}])Current behavior
--clusteruses Union-Find to assign acluster_idin finding metadatacluster_idfor grouping — HTML lists findings flat, JSON/SARIF bury it in metadata