feat: integrate deduplicator into scan response by Praneeth2711 · Pull Request #113 · ionfwsrijan/PatchPilot

Praneeth2711 · 2026-06-15T10:46:11Z

Before opening: make sure there is an issue tracking this work, and link it below. PRs without a linked issue may be closed without review.

Linked issue

Closes #85

What this PR does

This PR integrates the deduplication logic into the backend scan process. It runs the deduplicator on aggregated findings to collapse duplicate findings based on text embeddings. It supports configuring the deduplication via environment variables (DISABLE_DEDUP and DEDUP_EPSILON) and handles cases where sentence-transformers is not installed.

Type of change

ML tier (if applicable)

Tier 1 — Triage
Tier 2 — Predictive
Tier 3 — Autonomous
Not ML-related

Stack affected

Backend
Frontend
Both

Changes

Backend

Modified �ackend/app/main.py to integrate deduplication into the scanning background task before writing to the database.
Read environment variables DISABLE_DEDUP and DEDUP_EPSILON to configure deduplication behavior.
Added test coverage in �ackend/tests/test_scan_dedup.py to verify deduplication with different configurations.

Frontend

New dependencies

Database / schema changes

Testing

How did you test this?

Tested by running the backend pytest suite, specifically targeting est_scan_dedup.py. Checked that the deduplication runs successfully and filters out duplicate findings, and falls back gracefully when sentence-transformers is unavailable or if DISABLE_DEDUP is enabled.

Checklist

Tested locally end-to-end (upload ZIP or GitHub URL → scan → findings returned correctly)
New ML model falls back gracefully when model file is absent
No new console.error or unhandled Python exceptions introduced
Added or updated tests where applicable
[x]
equirements.txt / package.json updated if new dependencies added
New model files (.pkl, .pt, etc.) are gitignored, not committed

Anything reviewers should focus on

Please check the integration within _run_single_scan_task background task.

Screenshots (if UI changed)

github-actions · 2026-06-15T10:46:23Z

⚠️ Automated Check: This PR does not strictly follow the required template. Please ensure you have not deleted any checkboxes or mandatory headings, and that you have written explanations under What this PR does and How did you test this?.

ionfwsrijan · 2026-06-15T10:51:35Z

@Praneeth2711 Fix failing checks and PR description.

Join our dc server to connect with fellow contributors and mentors.

https://discord.gg/FcXuyw2Rs

Make sure to star the repo.

- apply deduplication after scan aggregation - add raw_finding_count and finding_count - support DISABLE_DEDUP - support configurable DEDUP_EPSILON - gracefully skip deduplication when sentence-transformers is unavailable

Praneeth2711 · 2026-06-15T11:17:32Z

ionfwsrijan · 2026-06-15T11:23:14Z

@Praneeth2711 Update PR description

ionfwsrijan · 2026-06-15T11:42:17Z

ionfwsrijan · 2026-06-18T18:21:37Z

@Praneeth2711 Please fix failing checks

…sion issue

ionfwsrijan · 2026-06-21T07:54:17Z

@Praneeth2711 Fix failing checks and PR desc

github-actions Bot added SSoC26 needs-work Work needed labels Jun 15, 2026

feat: integrate deduplicator into scan response

c2c5dcc

- apply deduplication after scan aggregation - add raw_finding_count and finding_count - support DISABLE_DEDUP - support configurable DEDUP_EPSILON - gracefully skip deduplication when sentence-transformers is unavailable

Praneeth2711 force-pushed the feat/integrate-deduplicator-scan-response branch from f43229b to c2c5dcc Compare June 15, 2026 11:04

Praneeth2711 added 2 commits June 15, 2026 16:59

test: fix dedup tests when sentence-transformers is unavailable

178a359

chore: retrigger CI

b687ee6

chore: retrigger guardrails

679ccad

chore: retrigger guardrails

d5573c1

github-actions Bot added backend Backend issues feature New feature labels Jun 21, 2026

fix: resolve merge conflict with upstream main and fix Windows permis…

684cfaf

…sion issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate deduplicator into scan response#113

feat: integrate deduplicator into scan response#113
Praneeth2711 wants to merge 6 commits into
ionfwsrijan:mainfrom
Praneeth2711:feat/integrate-deduplicator-scan-response

Praneeth2711 commented Jun 15, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

ionfwsrijan commented Jun 15, 2026

Uh oh!

Praneeth2711 commented Jun 15, 2026 •

edited

Loading

Uh oh!

ionfwsrijan commented Jun 15, 2026

Uh oh!

ionfwsrijan commented Jun 15, 2026

Uh oh!

ionfwsrijan commented Jun 18, 2026

Uh oh!

ionfwsrijan commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Praneeth2711 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked issue

What this PR does

Type of change

ML tier (if applicable)

Stack affected

Changes

Backend

Frontend

New dependencies

Database / schema changes

Testing

Anything reviewers should focus on

Screenshots (if UI changed)

Uh oh!

github-actions Bot commented Jun 15, 2026

Uh oh!

ionfwsrijan commented Jun 15, 2026

Uh oh!

Praneeth2711 commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Linked issue

What this PR does

Type of change

ML tier (if applicable)

Stack affected

Changes

Backend

Frontend

New dependencies

Database / schema changes

Testing

Anything reviewers should focus on

Screenshots (if UI changed)

Uh oh!

ionfwsrijan commented Jun 15, 2026

Uh oh!

ionfwsrijan commented Jun 15, 2026

Linked issue

What this PR does

Type of change

ML tier (if applicable)

Stack affected

Changes

Backend

Frontend

New dependencies

Database / schema changes

Testing

Anything reviewers should focus on

Screenshots (if UI changed)

Uh oh!

ionfwsrijan commented Jun 18, 2026

Uh oh!

ionfwsrijan commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Praneeth2711 commented Jun 15, 2026 •

edited

Loading

Praneeth2711 commented Jun 15, 2026 •

edited

Loading