fix(tw): parse comma-and-space separated symbol-table numerals#198
Merged
Conversation
The TW 符號說明 numeral group disallowed interior whitespace, so an enumerated row written with comma-and-space (`210, 220, 230:欄位`) failed to match the symbol-table pattern entirely — every numeral in the row was then reported as undeclared by symbolTableCoverage. The tight form (`210,220,230`) already worked. Make the numeral group a separator-joined list whose separators (、 , ,) may carry whitespace on either side. An individual token still cannot contain whitespace, so the trailing separator (:/ tab / 2-space gap) before the name stays unambiguous; range expansion (20~25, PR1~PRn) and the tight comma form are unchanged. Covers both reported groups (`210, 220, 230` and `RS11, RS14`). TW-only by architecture — CN/US have no 符號說明 section. 5 regression tests. Closes #184.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Triage finding (issue #184,
check.tw.spec.symbolTableCoverage.amend)A TW 符號說明 row written with comma-and-space separated reference numerals (
210, 220, 230:欄位說明) was rejected wholesale, so every numeral in the row was reported as undeclared. The tight form210,220,230(no spaces) already parsed correctly. The reporter described exactly this: enumerated numerals separated by 逗點 (comma) read as not listed.Root cause
The numeral capture group
[A-Za-z0-9~~\-、,,]+allowed comma characters but not whitespace, so the interior space after each comma broke the match before the separator/name could be found.Fix
The numeral group is now a separator-joined list of alphanumeric tokens whose separators (
、 , ,) may carry whitespace on either side. An individual token still cannot contain whitespace, so the trailing separator before the name (:/ tab / 2-space gap) remains unambiguous. No ReDoS risk — each list step requires a literal separator.Verification gates
210, 220, 230:欄位說明→[]pre-fix; → 3 entries post-fix.RS11, RS14:訊號(the report's second group) likewise. Both fail-then-pass.test_symbol_table_tw.pysuite (24 cases) unchanged; range expansion (20~25,PR1~PRn) and the tight comma form (100、100a) all preserved.Future-proofing
210) and letter-prefixed (RS11) numerals.5 regression tests added. Closes #184.