Fix regex for 'viz' in Latinisms YAML#201
Conversation
SecondSkoll
left a comment
There was a problem hiding this comment.
Current change doesn't do anything. Output against supplied test case is:
test.md
3:24 suggestion Instead of 'viz', use 'specifically' or 'namely'. Canonical.025a-latinisms-with-english-equivalents
4:23 suggestion Instead of 'viz', use 'specifically' or 'namely'. Canonical.025a-latinisms-with-english-equivalents
8:22 suggestion Instead of 'viz.', use 'specifically' or 'namely'. Canonical.025a-latinisms-with-english-equivalents
Which still misses a case and flags a false positive. Fix suggested.
| \b(?:versus|vs\.(?!\w)|vs(?![\.\w])): "'compared to/with' or 'opposed to'" | ||
| \bvice\sversa\b: "'the reverse' or 'the other way around'" | ||
| \b(viz\.(?!\w)|viz(?![\w\.])): "'specifically' or 'namely'" | ||
| \b(?:viz\.(?!\w)|viz(?![\w\.])): "'specifically' or 'namely'" |
There was a problem hiding this comment.
| \b(?:viz\.(?!\w)|viz(?![\w\.])): "'specifically' or 'namely'" | |
| \b(viz(?!\.?\w)): "'specifically' or 'namely'" |
The suggested fix doesn't actually do anything (there's some weirdness to non-capturing groups in our Vale implementation), in fact it's the same logic as the existing code. It should work properly already, but as you have raised there are some uncaptured cases and some false positives for some reason.
It seems there's some real weirdness with escaped . characters, which is the root of the issue. It looks like the easiest way to deal with it is to drop the capture of the . that should exist after viz.
This requires a small change to the test cases as well.
This PR adds
viz.(short for videlicet) to the list of discouraged Latin abbreviationsRegex Details
The regex is designed to avoid false-positive matches on domains, emails, and overlapping words (e.g.
viz.com,vizier,supervise):viz\.(?!\w): Matchesviz.only if the dot is not followed by a word character (catches"viz. ", misses"viz.com").viz(?![\w\.]): Matchesvizonly if the word is not followed by a word character or a dot (catches"viz, ", misses"vizier").Verification
Testing this against the following markdown confirms correct behavior: