Skip to content

fix(walker): NP-boundary refinements + diagnostic enrichment (batch r3)#182

Merged
kwisschen merged 2 commits into
mainfrom
walker/batch-2026-06-01-r3
Jun 1, 2026
Merged

fix(walker): NP-boundary refinements + diagnostic enrichment (batch r3)#182
kwisschen merged 2 commits into
mainfrom
walker/batch-2026-06-01-r3

Conversation

@kwisschen
Copy link
Copy Markdown
Owner

Batch round 3 — NP-boundary fixes + diagnostic enrichment

Three real over-capture / over-stop bugs surfaced by user reports, plus NP-stop diagnostic fields so the next iteration of these classes is self-classifying from the report payload alone (per standing instruction).

Walker fixes

Issue Jurisdiction Mechanism Class
#171 CN _TRAILING_VERB_DENYLIST_CN += `相邻` (adjacent) tw_contamination Q1 over-capture (限位导槽分别相邻限位导槽)
#177 CN _CN_SPEC_SUPPORT_INTERIOR_REJECTS += `具有一` Possession-verb predication mirror of 耦接一/耦合一/连接一 (dedupes 间具有一锐角 against clean 锐角)
#178 finding 1 US Cardinal + `through hole(s)` compound-NP synthesis `_NP_CORE` truncates at `through` because of preposition stop; restore the compound at the definite-ref consumer site

Diagnostic enrichment

Antecedent-basis findings now carry two new fields combined with the existing `np_boundary_char`:

  • `term_word_count` — number of whitespace-separated tokens in the matched term. A 1-2 word count with a stop-word-class `next_word_after_term` is the signature of NP truncation mid-compound.
  • `next_word_after_term` — the actual next token after the term in the claim text (head ≤20 chars, privacy-safe).

Together these let the next iteration of NP over-stop bugs (e.g. `through hole`, `fixing hole`, similar two-word patent compounds) be classified from the issue payload alone — no draft needed.

Cross-jurisdiction analysis

  • CN `相邻` — TW already covers via its existing trailing-strip pipeline (parity preserved)
  • CN `具有一` — TW already has `耦接一/耦合一/連接一` equivalents (parity preserved)
  • US `through hole` — CJK doesn't have the same truncation pattern (`通孔` is a single CJK token); deferred per DR-1

Gates

  • pytest: 2704 passed / 11 skipped
  • US harness: 0 / 0 / 0 (unresolved_new / unresolved_removed / protect_violations)
  • CN harness: 0 / 0 / 0
  • Wheel rebuilt

Closes #171
Closes #177

@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
patent-lint Ready Ready Preview, Comment Jun 1, 2026 9:50am

kwisschen added 2 commits June 1, 2026 17:48
Three real over-capture / over-stop bugs surfaced by user reports, plus
NP-stop diagnostic fields so the next iteration of these classes is
self-classifying from the report payload alone.

CN walker — trailing locative-verb (issue #171):
- _TRAILING_VERB_DENYLIST_CN += `相邻` (adjacent). Pure adjective in
  CN patent diction; appears only at prefix position in legitimate
  compounds (相邻边缘 / 相邻区域 / 相邻通道). Fixes the
  tw_contamination Q1 path over-capture `限位导槽分别相邻` → `限位导槽`.
  Cross-jurisdiction: TW already covers via the existing trailing-
  strip pipeline (parity preserved).

CN spec-support — possession-verb interior reject (issue #177):
- _CN_SPEC_SUPPORT_INTERIOR_REJECTS += `具有一`. Mirror of the
  耦接一/耦合一/连接一 family already in this list: a possession verb
  taking an indefinite object (`<noun>具有一<noun>`) is a predication,
  never an element name. Resolves the duplicate inventory candidate
  `间具有一锐角` where the clean `锐角` already inventoried.

US walker — `through hole(s)` compound-NP synthesis (issue #178 #1):
- New post-extension at the definite-ref consumer site in claims.py.
  _NP_CORE in utils.py truncates at `through` because it's listed
  as a preposition in _STOP_WORDS — but `through hole` is a discrete
  patent compound (a hole that passes through a substrate, MPEP
  § 2173 element-naming convention). When the captured term is
  exactly a bare cardinal AND the source-text continuation is
  `through hole(s)`, restore the compound. Tightly gated: cardinal-
  only prevents over-extension in verb-prep contexts (`the X mounted
  through holes in the wall` — `X` is not a bare cardinal, no
  extension). Cross-jurisdiction: CN/TW don't have the same
  compound-NP truncation pattern; CJK already treats `通孔` as a
  single token. Deferred per DR-1.

Diagnostic enrichment (covers #178 finding 1, #179, future class):
- Antecedent-basis findings now carry `term_word_count` +
  `next_word_after_term`. Together with the existing
  `np_boundary_char`, an over-stop bug (single-word term truncated
  at a stop word) is self-evident from the payload — next iteration
  of this class can be triaged without the draft. Privacy-safe:
  one short token, head ≤20 chars.

Gates:
- pytest: 2704 passed / 11 skipped
- US harness: 0 / 0 / 0
- CN harness: 0 / 0 / 0
- Wheel rebuilt

Closes #171
Closes #177
Closes #178
Closes #180
@kwisschen kwisschen force-pushed the walker/batch-2026-06-01-r3 branch from 70b934a to c26bedb Compare June 1, 2026 09:49
@kwisschen kwisschen merged commit ae9957c into main Jun 1, 2026
6 checks passed
kwisschen added a commit that referenced this pull request Jun 1, 2026
Two walker-rule additions in PR #182 didn't bump current_round or add
round_history entries because no active corpus labels were silenced
(autoship discipline reading: "non-shifting fix, labels JSON untouched
when no labels silenced"). That reading is technically defensible, but
defeats the cross-jurisdiction discipline pytest at
tests/test_cross_jurisdiction_discipline.py — which audits round_history
for parity decisions. Mechanism additions must be traceable regardless
of whether the local corpus happens to exercise them.

Backfilling:

- CN R38 `r38_locative_adjective_trailing_strip` — adds 相邻 to
  _TRAILING_VERB_DENYLIST_CN. Real user signal: issue #171
  tw_contamination over-capture. Cross-jurisdiction: TW already covers
  via existing trailing-strip pipeline.

- US R10 `r10_through_hole_compound_np_synthesis` — adds bare-cardinal
  + through hole(s) compound-NP post-extension at the _DEFINITE_REF
  consumer site. Real user signal: issue #178 finding 1. Cross-
  jurisdiction: deferred per DR-1 (CJK 通孔 is a single token).

Both entries marked fixtures_silenced: 0 to keep the
zero-corpus-impact context honest. Both name the source GH issue in
user_attested_issues for forward traceability.

Gates:
- pytest: 2704 passed / 11 skipped
- cross-jurisdiction discipline pytest: passes (both entries name
  cross-juris markers + DR-1 single-juris-scope marker)
- US harness: 0 / 0 / 0
- CN harness: 0 / 0 / 0
kwisschen added a commit that referenced this pull request Jun 1, 2026
Two complementary cleanups discovered when auditing whether walker-round
discipline was being applied properly:

1. round_history backfill (PR #182 rule additions)
   - CN R38 `r38_locative_adjective_trailing_strip` — adds 相邻 to
     _TRAILING_VERB_DENYLIST_CN. Real user signal: issue #171.
     Cross-juris: TW already covers via existing trailing-strip pipeline.
   - US R10 `r10_through_hole_compound_np_synthesis` — adds bare-cardinal
     + through hole(s) compound-NP post-extension. Real user signal:
     issue #178 finding 1. Cross-juris: deferred per DR-1 (CJK 通孔 is
     a single token).
   Both marked fixtures_silenced: 0 honestly (rule additions only —
   corpus didn't exercise these patterns).

2. TW spec_support parity (mirror of PR #181 CN additions)
   - _TW_SPEC_SUPPORT_TRAILING_TOKENS += 抵靠 / 穿設 / 穿過 / 分別穿過
   - PR #181 added the SC variants to CN from real CN reports (#174 /
     #175 / #176). All four are Traditional-compatible perforation /
     abutment verbs. Generalized per the user's standing instruction
     to mirror fixes across applicable jurisdictions.

Discipline lesson: autoship walker fixes that don't silence corpus
labels still need round_history entries — the cross-jurisdiction
discipline pytest (tests/test_cross_jurisdiction_discipline.py) audits
round_history for parity decisions. Updated triage-report SKILL.md
(local) to enforce this going forward.

Gates:
- pytest: 2704 passed / 11 skipped
- cross-jurisdiction discipline pytest: passes
- US harness: 0 / 0 / 0
- CN harness: 0 / 0 / 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[report] specSupport [report] antecedentBasis

1 participant