Skip to content

Refactor banned-term scanning to check visible text#19

Open
harbourviewcompany-create wants to merge 1 commit into
mainfrom
codex/refactor-banned-term-scanning-in-check-site.js-jos70h
Open

Refactor banned-term scanning to check visible text#19
harbourviewcompany-create wants to merge 1 commit into
mainfrom
codex/refactor-banned-term-scanning-in-check-site.js-jos70h

Conversation

@harbourviewcompany-create
Copy link
Copy Markdown
Owner

@harbourviewcompany-create harbourviewcompany-create commented May 20, 2026

Motivation

  • Ensure banned-term matching targets user-visible content rather than raw HTML source to avoid false positives from markup, comments, or scripts.
  • Strip HTML comments and <script>/<style> blocks as a minimum preprocessing step before pattern matching.
  • Normalize decoded entities and whitespace so pattern tests run on human-readable strings.

Description

  • Added decodeHtmlEntities(text) to perform lightweight HTML entity decoding and unicode numeric/entity decoding.
  • Added extractVisibleText(html) which strips HTML comments, removes <script> and <style> blocks, strips tags, decodes entities, and normalizes whitespace.
  • Switched banned-term checks to run pattern.test(visibleText) where visibleText is produced by extractVisibleText(html), keeping the bannedPatterns array and failure message format unchanged.

Testing

  • Ran node scripts/check-site.js and the script completed successfully with no failures.
  • Existing CI-style failure format and bannedPatterns behavior were preserved to avoid changing downstream expectations.

Codex Task


Open in Devin Review

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 20, 2026

Deploy Preview for wurx-can ready!

Name Link
🔨 Latest commit 5e99521
🔍 Latest deploy log https://app.netlify.com/projects/wurx-can/deploys/6a0de76fc424fd0009c44ac6
😎 Deploy Preview https://deploy-preview-19--wurx-can.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Warning

Rate limit exceeded

@harbourviewcompany-create has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 55 minutes and 4 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bb74ba2d-071a-45ab-8fe4-ff7fd03909c6

📥 Commits

Reviewing files that changed from the base of the PR and between 6e00de5 and 5e99521.

📒 Files selected for processing (1)
  • scripts/check-site.js

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 20, 2026

Deploy Preview for wurx-otta ready!

Name Link
🔨 Latest commit 5e99521
🔍 Latest deploy log https://app.netlify.com/projects/wurx-otta/deploys/6a0de76f52ce4f00071bea67
😎 Deploy Preview https://deploy-preview-19--wurx-otta.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 3 potential issues.

Open in Devin Review

Comment thread scripts/check-site.js
Comment on lines +48 to +49
.replace(/&#(\d+);/g, (_, codePoint) => String.fromCodePoint(Number(codePoint)))
.replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => String.fromCodePoint(parseInt(hexCodePoint, 16)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Unhandled RangeError from String.fromCodePoint on invalid numeric HTML entities

The generic numeric entity handlers on lines 48-49 pass decoded numbers directly to String.fromCodePoint() without validating that they are valid Unicode code points (0 to 0x10FFFF). If an HTML file contains a malformed entity like &#99999999999; or &#xFFFFFF;, String.fromCodePoint() throws an unhandled RangeError, crashing the entire check script. Since this function is applied to every .html file found in the root directory, a single malformed entity in any file would prevent the entire validation suite from running.

Suggested change
.replace(/&#(\d+);/g, (_, codePoint) => String.fromCodePoint(Number(codePoint)))
.replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => String.fromCodePoint(parseInt(hexCodePoint, 16)));
.replace(/&#(\d+);/g, (_, codePoint) => { const n = Number(codePoint); return n <= 0x10FFFF ? String.fromCodePoint(n) : ""; })
.replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => { const n = parseInt(hexCodePoint, 16); return n <= 0x10FFFF ? String.fromCodePoint(n) : ""; });
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread scripts/check-site.js
const visibleText = extractVisibleText(html);
for (const pattern of bannedPatterns) {
if (pattern.test(html)) {
if (pattern.test(visibleText)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Banned-term scan no longer covers HTML attribute content (meta tags, alt text, etc.)

The refactor from pattern.test(html) to pattern.test(visibleText) at line 103 means banned terms in HTML attributes are no longer checked. The extractVisibleText function strips all tags via /<[^>]+>/g at line 57, which discards attribute values entirely. This means banned legacy terms like SummitLine or Roofing appearing in <meta name="description" content="...">, <meta property="og:title" content="...">, <img alt="...">, or <input placeholder="..."> would go undetected. These are user-visible in search results, social media previews, and screen readers respectively. The commit message ("use visible HTML text") suggests this narrowing is intentional—likely to avoid false positives from CSS class names, data attributes, or JS identifiers—but the trade-off should be explicitly acknowledged since meta descriptions and OG tags are the most common places legacy branding lingers after a rebrand.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread scripts/check-site.js
Comment on lines +38 to +50
function decodeHtmlEntities(text) {
return text
.replace(/&nbsp;/gi, " ")
.replace(/&amp;/gi, "&")
.replace(/&lt;/gi, "<")
.replace(/&gt;/gi, ">")
.replace(/&quot;/gi, "\"")
.replace(/&#39;/gi, "'")
.replace(/&#x27;/gi, "'")
.replace(/&#x2F;/gi, "/")
.replace(/&#(\d+);/g, (_, codePoint) => String.fromCodePoint(Number(codePoint)))
.replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => String.fromCodePoint(parseInt(hexCodePoint, 16)));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 Info: Entity decoding order causes asymmetric double-decode behavior

In decodeHtmlEntities, &amp; is decoded on line 41 before the generic numeric entity handlers on lines 48-49. This means double-encoded numeric entities like &amp;#82; get decoded in two passes: first &amp;& producing &#82;, then &#82;R. However, double-encoded named entities like &amp;nbsp; do NOT get double-decoded because the &nbsp; replacement (line 40) already ran before &amp; was decoded (line 41). This asymmetry is not a practical bug for this use case—it actually helps catch obfuscated banned terms—but it's worth noting the function doesn't faithfully mirror browser entity decoding semantics.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant