-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor banned-term scanning to check visible text #19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -35,6 +35,30 @@ function exists(file) { | |
| return fs.existsSync(path.join(root, file)); | ||
| } | ||
|
|
||
| function decodeHtmlEntities(text) { | ||
| return text | ||
| .replace(/ /gi, " ") | ||
| .replace(/&/gi, "&") | ||
| .replace(/</gi, "<") | ||
| .replace(/>/gi, ">") | ||
| .replace(/"/gi, "\"") | ||
| .replace(/'/gi, "'") | ||
| .replace(/'/gi, "'") | ||
| .replace(///gi, "/") | ||
| .replace(/&#(\d+);/g, (_, codePoint) => String.fromCodePoint(Number(codePoint))) | ||
| .replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => String.fromCodePoint(parseInt(hexCodePoint, 16))); | ||
| } | ||
|
Comment on lines
+38
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 📝 Info: Entity decoding order causes asymmetric double-decode behavior In Was this helpful? React with 👍 or 👎 to provide feedback. |
||
|
|
||
| function extractVisibleText(html) { | ||
| const withoutComments = html.replace(/<!--[\s\S]*?-->/g, " "); | ||
| const withoutScriptAndStyle = withoutComments | ||
| .replace(/<script\b[^>]*>[\s\S]*?<\/script>/gi, " ") | ||
| .replace(/<style\b[^>]*>[\s\S]*?<\/style>/gi, " "); | ||
| const withoutTags = withoutScriptAndStyle.replace(/<[^>]+>/g, " "); | ||
| const decodedText = decodeHtmlEntities(withoutTags); | ||
| return decodedText.replace(/\s+/g, " ").trim(); | ||
| } | ||
|
|
||
| for (const file of requiredFiles) { | ||
| if (!exists(file)) { | ||
| failures.push(`Missing required file: ${file}`); | ||
|
|
@@ -74,8 +98,9 @@ for (const [file, formName] of Object.entries(formRequirements)) { | |
|
|
||
| for (const file of fs.readdirSync(root).filter((name) => name.endsWith(".html"))) { | ||
| const html = read(file); | ||
| const visibleText = extractVisibleText(html); | ||
| for (const pattern of bannedPatterns) { | ||
| if (pattern.test(html)) { | ||
| if (pattern.test(visibleText)) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🚩 Banned-term scan no longer covers HTML attribute content (meta tags, alt text, etc.) The refactor from Was this helpful? React with 👍 or 👎 to provide feedback. |
||
| failures.push(`${file} contains banned legacy term: ${pattern}`); | ||
| } | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 Unhandled RangeError from String.fromCodePoint on invalid numeric HTML entities
The generic numeric entity handlers on lines 48-49 pass decoded numbers directly to
String.fromCodePoint()without validating that they are valid Unicode code points (0 to 0x10FFFF). If an HTML file contains a malformed entity like�or�,String.fromCodePoint()throws an unhandledRangeError, crashing the entire check script. Since this function is applied to every.htmlfile found in the root directory, a single malformed entity in any file would prevent the entire validation suite from running.Was this helpful? React with 👍 or 👎 to provide feedback.