fix(walker): NP-boundary refinements + diagnostic enrichment (batch r3)#182
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Three real over-capture / over-stop bugs surfaced by user reports, plus NP-stop diagnostic fields so the next iteration of these classes is self-classifying from the report payload alone. CN walker — trailing locative-verb (issue #171): - _TRAILING_VERB_DENYLIST_CN += `相邻` (adjacent). Pure adjective in CN patent diction; appears only at prefix position in legitimate compounds (相邻边缘 / 相邻区域 / 相邻通道). Fixes the tw_contamination Q1 path over-capture `限位导槽分别相邻` → `限位导槽`. Cross-jurisdiction: TW already covers via the existing trailing- strip pipeline (parity preserved). CN spec-support — possession-verb interior reject (issue #177): - _CN_SPEC_SUPPORT_INTERIOR_REJECTS += `具有一`. Mirror of the 耦接一/耦合一/连接一 family already in this list: a possession verb taking an indefinite object (`<noun>具有一<noun>`) is a predication, never an element name. Resolves the duplicate inventory candidate `间具有一锐角` where the clean `锐角` already inventoried. US walker — `through hole(s)` compound-NP synthesis (issue #178 #1): - New post-extension at the definite-ref consumer site in claims.py. _NP_CORE in utils.py truncates at `through` because it's listed as a preposition in _STOP_WORDS — but `through hole` is a discrete patent compound (a hole that passes through a substrate, MPEP § 2173 element-naming convention). When the captured term is exactly a bare cardinal AND the source-text continuation is `through hole(s)`, restore the compound. Tightly gated: cardinal- only prevents over-extension in verb-prep contexts (`the X mounted through holes in the wall` — `X` is not a bare cardinal, no extension). Cross-jurisdiction: CN/TW don't have the same compound-NP truncation pattern; CJK already treats `通孔` as a single token. Deferred per DR-1. Diagnostic enrichment (covers #178 finding 1, #179, future class): - Antecedent-basis findings now carry `term_word_count` + `next_word_after_term`. Together with the existing `np_boundary_char`, an over-stop bug (single-word term truncated at a stop word) is self-evident from the payload — next iteration of this class can be triaged without the draft. Privacy-safe: one short token, head ≤20 chars. Gates: - pytest: 2704 passed / 11 skipped - US harness: 0 / 0 / 0 - CN harness: 0 / 0 / 0 - Wheel rebuilt Closes #171 Closes #177 Closes #178 Closes #180
70b934a to
c26bedb
Compare
kwisschen
added a commit
that referenced
this pull request
Jun 1, 2026
Two walker-rule additions in PR #182 didn't bump current_round or add round_history entries because no active corpus labels were silenced (autoship discipline reading: "non-shifting fix, labels JSON untouched when no labels silenced"). That reading is technically defensible, but defeats the cross-jurisdiction discipline pytest at tests/test_cross_jurisdiction_discipline.py — which audits round_history for parity decisions. Mechanism additions must be traceable regardless of whether the local corpus happens to exercise them. Backfilling: - CN R38 `r38_locative_adjective_trailing_strip` — adds 相邻 to _TRAILING_VERB_DENYLIST_CN. Real user signal: issue #171 tw_contamination over-capture. Cross-jurisdiction: TW already covers via existing trailing-strip pipeline. - US R10 `r10_through_hole_compound_np_synthesis` — adds bare-cardinal + through hole(s) compound-NP post-extension at the _DEFINITE_REF consumer site. Real user signal: issue #178 finding 1. Cross- jurisdiction: deferred per DR-1 (CJK 通孔 is a single token). Both entries marked fixtures_silenced: 0 to keep the zero-corpus-impact context honest. Both name the source GH issue in user_attested_issues for forward traceability. Gates: - pytest: 2704 passed / 11 skipped - cross-jurisdiction discipline pytest: passes (both entries name cross-juris markers + DR-1 single-juris-scope marker) - US harness: 0 / 0 / 0 - CN harness: 0 / 0 / 0
kwisschen
added a commit
that referenced
this pull request
Jun 1, 2026
Two complementary cleanups discovered when auditing whether walker-round discipline was being applied properly: 1. round_history backfill (PR #182 rule additions) - CN R38 `r38_locative_adjective_trailing_strip` — adds 相邻 to _TRAILING_VERB_DENYLIST_CN. Real user signal: issue #171. Cross-juris: TW already covers via existing trailing-strip pipeline. - US R10 `r10_through_hole_compound_np_synthesis` — adds bare-cardinal + through hole(s) compound-NP post-extension. Real user signal: issue #178 finding 1. Cross-juris: deferred per DR-1 (CJK 通孔 is a single token). Both marked fixtures_silenced: 0 honestly (rule additions only — corpus didn't exercise these patterns). 2. TW spec_support parity (mirror of PR #181 CN additions) - _TW_SPEC_SUPPORT_TRAILING_TOKENS += 抵靠 / 穿設 / 穿過 / 分別穿過 - PR #181 added the SC variants to CN from real CN reports (#174 / #175 / #176). All four are Traditional-compatible perforation / abutment verbs. Generalized per the user's standing instruction to mirror fixes across applicable jurisdictions. Discipline lesson: autoship walker fixes that don't silence corpus labels still need round_history entries — the cross-jurisdiction discipline pytest (tests/test_cross_jurisdiction_discipline.py) audits round_history for parity decisions. Updated triage-report SKILL.md (local) to enforce this going forward. Gates: - pytest: 2704 passed / 11 skipped - cross-jurisdiction discipline pytest: passes - US harness: 0 / 0 / 0 - CN harness: 0 / 0 / 0
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Batch round 3 — NP-boundary fixes + diagnostic enrichment
Three real over-capture / over-stop bugs surfaced by user reports, plus NP-stop diagnostic fields so the next iteration of these classes is self-classifying from the report payload alone (per standing instruction).
Walker fixes
限位导槽分别相邻→限位导槽)间具有一锐角against clean锐角)Diagnostic enrichment
Antecedent-basis findings now carry two new fields combined with the existing `np_boundary_char`:
Together these let the next iteration of NP over-stop bugs (e.g. `through hole`, `fixing hole`, similar two-word patent compounds) be classified from the issue payload alone — no draft needed.
Cross-jurisdiction analysis
Gates
Closes #171
Closes #177