Cluster embeddings with DBSCAN and collapse duplicates

## Description

Use DBSCAN clustering on the embeddings to group near-duplicate findings and collapse each cluster into a single representative finding with metadata about the duplicates.

## What to implement

Create `backend/app/ml/deduplicator.py`:
- `deduplicate(findings: list[dict], epsilon: float = 0.15) -> list[dict]`
- Embeds findings via `embedder.py`
- Runs `sklearn.cluster.DBSCAN(eps=epsilon, min_samples=2, metric='cosine')`
- For each cluster, keep the finding with the highest `raw_severity` as representative
- Attach `duplicate_count: int` and `related_files: list[str]` to representative
- Noise points (label == -1) are returned as-is with `duplicate_count: 0`

## Acceptance criteria

- [ ] A set of 10 identical SQL injection findings across different files collapses to 1
- [ ] `duplicate_count` accurately reflects the cluster size
- [ ] `related_files` lists all files in the cluster except the representative's own file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster embeddings with DBSCAN and collapse duplicates #84

Description

What to implement

Acceptance criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cluster embeddings with DBSCAN and collapse duplicates #84

Description

Description

What to implement

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions