Skip to content

Improve clone group presentation for N-way duplicates #5

@drogers0

Description

@drogers0

Problem

When the same code is duplicated across N files, the output contains N*(N-1)/2 pairwise findings tagged with a cluster_id, but no consolidated view. Users must mentally reconstruct which functions form a clone group.

For example, if function X is duplicated in files A, B, C, and D, the report shows 6 separate pairwise findings (A-B, A-C, A-D, B-C, B-D, C-D) rather than one group listing all 4 locations.

Desired behavior

  • A "clone group" concept that collapses N-way duplicates into a single group listing all locations
  • HTML reporter should visually group findings by cluster, showing the group as one expandable section with all N locations rather than separate pairwise entries
  • JSON reporter should support a grouped structure (e.g. "groups": [{"locations": [...], "findings": [...]}])
  • Stats should include group-level metrics (e.g. "12 clone groups across 47 functions")

Current behavior

  • --cluster uses Union-Find to assign a cluster_id in finding metadata
  • Reporters don't use cluster_id for grouping — HTML lists findings flat, JSON/SARIF bury it in metadata
  • No group-level summary or metrics

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions