Skip to content

fix(tw): parse comma-and-space separated symbol-table numerals#198

Merged
kwisschen merged 1 commit into
mainfrom
fix/tw-symbol-table-comma-list
Jun 5, 2026
Merged

fix(tw): parse comma-and-space separated symbol-table numerals#198
kwisschen merged 1 commit into
mainfrom
fix/tw-symbol-table-comma-list

Conversation

@kwisschen
Copy link
Copy Markdown
Owner

Triage finding (issue #184, check.tw.spec.symbolTableCoverage.amend)

A TW 符號說明 row written with comma-and-space separated reference numerals (210, 220, 230:欄位說明) was rejected wholesale, so every numeral in the row was reported as undeclared. The tight form 210,220,230 (no spaces) already parsed correctly. The reporter described exactly this: enumerated numerals separated by 逗點 (comma) read as not listed.

Root cause

The numeral capture group [A-Za-z0-9~~\-、,,]+ allowed comma characters but not whitespace, so the interior space after each comma broke the match before the separator/name could be found.

Fix

The numeral group is now a separator-joined list of alphanumeric tokens whose separators (、 , ,) may carry whitespace on either side. An individual token still cannot contain whitespace, so the trailing separator before the name (/ tab / 2-space gap) remains unambiguous. No ReDoS risk — each list step requires a literal separator.

Verification gates

  • Reproducer: 210, 220, 230:欄位說明[] pre-fix; → 3 entries post-fix. RS11, RS14:訊號 (the report's second group) likewise. Both fail-then-pass.
  • Anti-corpus: full existing test_symbol_table_tw.py suite (24 cases) unchanged; range expansion (20~25, PR1~PRn) and the tight comma form (100、100a) all preserved.
  • Statute pin: 專利法施行細則 §17 / §19 (符號說明 must declare every reference numeral used) — the check is correct to flag undeclared numerals; the bug was that correctly-declared numerals weren't parsed.

Future-proofing

  1. Pattern coverage — handles comma-and-space, full-width-comma, 、, and whitespace on both sides of the separator, for both digit (210) and letter-prefixed (RS11) numerals.
  2. Cross-jurisdiction — TW-only: CN/US/EPC have no 符號說明 section (符號說明 is TW-unique), so no mirror applies.
  3. Diagnostic enrichment — not needed: the fix covers the whole whitespace-in-list class, so this report class won't recur.

5 regression tests added. Closes #184.

The TW 符號說明 numeral group disallowed interior whitespace, so an
enumerated row written with comma-and-space (`210, 220, 230:欄位`) failed
to match the symbol-table pattern entirely — every numeral in the row was
then reported as undeclared by symbolTableCoverage. The tight form
(`210,220,230`) already worked.

Make the numeral group a separator-joined list whose separators (、 , ,)
may carry whitespace on either side. An individual token still cannot
contain whitespace, so the trailing separator (:/ tab / 2-space gap)
before the name stays unambiguous; range expansion (20~25, PR1~PRn) and
the tight comma form are unchanged.

Covers both reported groups (`210, 220, 230` and `RS11, RS14`). TW-only
by architecture — CN/US have no 符號說明 section. 5 regression tests.

Closes #184.
@vercel
Copy link
Copy Markdown

vercel Bot commented Jun 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
patent-lint Ready Ready Preview, Comment Jun 5, 2026 4:37am

@kwisschen kwisschen merged commit 251c747 into main Jun 5, 2026
6 checks passed
@kwisschen kwisschen deleted the fix/tw-symbol-table-comma-list branch June 5, 2026 04:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[report] check.tw.spec.symbolTableCoverage.amend

1 participant