Skip to content

feat(diagnostics): autonomous-triage enrichments + confidence threshold#170

Merged
kwisschen merged 1 commit into
mainfrom
feat/diagnostic-enrichments-and-conf-threshold
Jun 1, 2026
Merged

feat(diagnostics): autonomous-triage enrichments + confidence threshold#170
kwisschen merged 1 commit into
mainfrom
feat/diagnostic-enrichments-and-conf-threshold

Conversation

@kwisschen
Copy link
Copy Markdown
Owner

Why

Two parallel issues addressed in one PR:

  1. Confidence badge invisible on typical drafts — the +25 ML-tree boost in compute_confidence_score only fires when intros_pool > 53 (drafts with 50+ claims). Typical 10-20 claim test drafts never had findings at threshold 75 → badge effectively never shown.

  2. Diagnostic payloads lack autonomous-triage context — previous narrow context windows (30/22/18/12 chars) and limited per-finding shape markers forced per-report context-asking loops. With bulk reports coming, the payload alone needs to be enough.

Changes

Diagnostic enrichments (Privacy §6 compliant — structural metadata, not draft prose)

Antecedent (extract_antecedent_basis) gains:

  • np_boundary_char — single char immediately after captured term (identifies whether an exclusion-set extension is the right fix)
  • ref_marker_before — which definite-reference marker preceded the term (so / 前述 / 該 / said / the) — informs possessive vs bare-intro classification
  • body_cross_refs — list of cited claim numbers in the same claim text (surfaces incorporation-by-reference candidates autonomously)

Spec-support (extract_spec_support) gains:

  • phrase_charlen / phrase_first_chars / phrase_last_char — shape markers for tokenization-class FP triage
  • has_leading_ref_marker — whether captured phrase retains a qualifier prefix (surfaces normalize-chain failures)

Context windows widened 30/22/18/12 → 60/45/35/25 (Latin/JA/Hangul/Han) on BOTH client and server. Still under Privacy §6 "full paragraph" threshold; still capped to 5 findings/report.

Confidence badge threshold 75 → 65

Measured 66.4% precision (vs 38% baseline; +28pp lift) at threshold 65 with 11.5% coverage = ~1-3 badges per typical draft. Honest 1pp gap from the 70% target disclosed in the title attribute alongside the baseline number.

Tests

pytest -q2704 passed, 11 skipped. Frontend build clean.

Three coordinated changes to reduce the per-report context-asking
loop and make the confidence badge actually visible on typical drafts.

1. Diagnostic payload enrichments (server + client)

Antecedent diagnostics gain:
  - np_boundary_char: single char immediately after captured term —
    helps identify when an exclusion-set extension is the right fix
  - ref_marker_before: which definite-reference marker preceded the
    term (so / 前述 / 該 / said / the) — informs possessive vs bare-
    intro vs reference classification
  - body_cross_refs: numeric refs to other claims in the same claim
    text (`claim N` / `請求項N` shapes) — surfaces incorporation-by-
    reference candidates without requiring the underlying draft

Spec-support diagnostics gain:
  - phrase_charlen / phrase_first_chars / phrase_last_char: shape
    markers for triaging tokenization-class FPs
  - has_leading_ref_marker: whether the captured phrase retains a
    qualifier prefix — surfaces normalize-chain failures

Context windows widened 30/22/18/12 → 60/45/35/25 (Latin/JA/Hangul/
Han) on BOTH client and server. The previous narrow windows
frequently truncated verb-object / Markush / possessive boundaries
needed for classification. Still under Privacy §6 "full paragraph"
threshold; still capped to 5 findings/report.

All new fields are structural metadata (single chars, char counts,
boolean flags, ref number lists). No additional draft prose enters
the payload beyond the widened excerpt windows.

2. Confidence-badge threshold lowered 75 → 65

Initial post-PR-168 diagnostic on the deployed bundle showed the
+25 ML-decision-tree boost (the only mechanism that pushes scores
into the 75-100 range) only fires when intros_pool > 53 — i.e., on
drafts with 50+ claims. Typical 10-20 claim test drafts never had
findings at the threshold, so the badge was effectively invisible.

At threshold 65: measured ~66% precision (vs 38% baseline; +28pp
lift) with ~11.5% coverage = ~1-3 badges per typical draft. Honest
1pp gap from the 70% target disclosed in the title attribute, which
also surfaces baseline for comparison.

3. Tests + harness

pytest 2704 passed, 11 skipped. No walker changes — extractor and
threshold updates only.
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
patent-lint Ready Ready Preview, Comment Jun 1, 2026 7:33am

@kwisschen kwisschen merged commit 76c986f into main Jun 1, 2026
6 checks passed
@kwisschen kwisschen deleted the feat/diagnostic-enrichments-and-conf-threshold branch June 1, 2026 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant