Refactor banned-term scanning to check visible text by harbourviewcompany-create · Pull Request #19 · harbourviewcompany-create/contractor

harbourviewcompany-create · 2026-05-20T16:55:08Z

Motivation

Ensure banned-term matching targets user-visible content rather than raw HTML source to avoid false positives from markup, comments, or scripts.
Strip HTML comments and <script>/<style> blocks as a minimum preprocessing step before pattern matching.
Normalize decoded entities and whitespace so pattern tests run on human-readable strings.

Description

Added decodeHtmlEntities(text) to perform lightweight HTML entity decoding and unicode numeric/entity decoding.
Added extractVisibleText(html) which strips HTML comments, removes <script> and <style> blocks, strips tags, decodes entities, and normalizes whitespace.
Switched banned-term checks to run pattern.test(visibleText) where visibleText is produced by extractVisibleText(html), keeping the bannedPatterns array and failure message format unchanged.

Testing

Ran node scripts/check-site.js and the script completed successfully with no failures.
Existing CI-style failure format and bannedPatterns behavior were preserved to avoid changing downstream expectations.

Codex Task

chatgpt-codex-connector · 2026-05-20T16:55:13Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

netlify · 2026-05-20T16:55:14Z

✅ Deploy Preview for wurx-can ready!

Name	Link
🔨 Latest commit	`5e99521`
🔍 Latest deploy log	https://app.netlify.com/projects/wurx-can/deploys/6a0de76fc424fd0009c44ac6
😎 Deploy Preview	https://deploy-preview-19--wurx-can.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-05-20T16:55:18Z

Warning

Rate limit exceeded

@harbourviewcompany-create has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 55 minutes and 4 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: bb74ba2d-071a-45ab-8fe4-ff7fd03909c6

📥 Commits

Reviewing files that changed from the base of the PR and between 6e00de5 and 5e99521.

📒 Files selected for processing (1)

scripts/check-site.js

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

netlify · 2026-05-20T16:55:24Z

✅ Deploy Preview for wurx-otta ready!

Name	Link
🔨 Latest commit	`5e99521`
🔍 Latest deploy log	https://app.netlify.com/projects/wurx-otta/deploys/6a0de76f52ce4f00071bea67
😎 Deploy Preview	https://deploy-preview-19--wurx-otta.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.
🤖 Make changes	Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

devin-ai-integration

Devin Review found 3 potential issues.

devin-ai-integration · 2026-05-20T16:58:05Z

+    .replace(/&#(\d+);/g, (_, codePoint) => String.fromCodePoint(Number(codePoint)))
+    .replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => String.fromCodePoint(parseInt(hexCodePoint, 16)));


🟡 Unhandled RangeError from String.fromCodePoint on invalid numeric HTML entities

The generic numeric entity handlers on lines 48-49 pass decoded numbers directly to String.fromCodePoint() without validating that they are valid Unicode code points (0 to 0x10FFFF). If an HTML file contains a malformed entity like &#99999999999; or &#xFFFFFF;, String.fromCodePoint() throws an unhandled RangeError, crashing the entire check script. Since this function is applied to every .html file found in the root directory, a single malformed entity in any file would prevent the entire validation suite from running.

Suggested change

.replace(/&#(\d+);/g, (_, codePoint) => String.fromCodePoint(Number(codePoint)))

.replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => String.fromCodePoint(parseInt(hexCodePoint, 16)));

.replace(/&#(\d+);/g, (_, codePoint) => { const n = Number(codePoint); return n <= 0x10FFFF ? String.fromCodePoint(n) : ""; })

.replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => { const n = parseInt(hexCodePoint, 16); return n <= 0x10FFFF ? String.fromCodePoint(n) : ""; });

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-05-20T16:58:07Z

+  const visibleText = extractVisibleText(html);
  for (const pattern of bannedPatterns) {
-    if (pattern.test(html)) {
+    if (pattern.test(visibleText)) {


🚩 Banned-term scan no longer covers HTML attribute content (meta tags, alt text, etc.)

The refactor from pattern.test(html) to pattern.test(visibleText) at line 103 means banned terms in HTML attributes are no longer checked. The extractVisibleText function strips all tags via /<[^>]+>/g at line 57, which discards attribute values entirely. This means banned legacy terms like SummitLine or Roofing appearing in <meta name="description" content="...">, <meta property="og:title" content="...">, <img alt="...">, or <input placeholder="..."> would go undetected. These are user-visible in search results, social media previews, and screen readers respectively. The commit message ("use visible HTML text") suggests this narrowing is intentional—likely to avoid false positives from CSS class names, data attributes, or JS identifiers—but the trade-off should be explicitly acknowledged since meta descriptions and OG tags are the most common places legacy branding lingers after a rebrand.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-05-20T16:58:08Z

+function decodeHtmlEntities(text) {
+  return text
+    .replace(/&nbsp;/gi, " ")
+    .replace(/&amp;/gi, "&")
+    .replace(/&lt;/gi, "<")
+    .replace(/&gt;/gi, ">")
+    .replace(/&quot;/gi, "\"")
+    .replace(/&#39;/gi, "'")
+    .replace(/&#x27;/gi, "'")
+    .replace(/&#x2F;/gi, "/")
+    .replace(/&#(\d+);/g, (_, codePoint) => String.fromCodePoint(Number(codePoint)))
+    .replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => String.fromCodePoint(parseInt(hexCodePoint, 16)));
+}


📝 Info: Entity decoding order causes asymmetric double-decode behavior

In decodeHtmlEntities, & is decoded on line 41 before the generic numeric entity handlers on lines 48-49. This means double-encoded numeric entities like &#82; get decoded in two passes: first & → & producing R, then R → R. However, double-encoded named entities like &nbsp; do NOT get double-decoded because the   replacement (line 40) already ran before & was decoded (line 41). This asymmetry is not a practical bug for this use case—it actually helps catch obfuscated banned terms—but it's worth noting the function doesn't faithfully mirror browser entity decoding semantics.

Was this helpful? React with 👍 or 👎 to provide feedback.

Refactor banned-term scan to use visible HTML text

5e99521

harbourviewcompany-create added the codex label May 20, 2026 — with ChatGPT Codex Connector

devin-ai-integration Bot reviewed May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor banned-term scanning to check visible text#19

Refactor banned-term scanning to check visible text#19
harbourviewcompany-create wants to merge 1 commit into
mainfrom
codex/refactor-banned-term-scanning-in-check-site.js-jos70h

harbourviewcompany-create commented May 20, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

chatgpt-codex-connector Bot commented May 20, 2026

Uh oh!

netlify Bot commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026

Rate limit exceeded

Uh oh!

netlify Bot commented May 20, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot May 20, 2026

Uh oh!

devin-ai-integration Bot May 20, 2026

Uh oh!

devin-ai-integration Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		.replace(/&#(\d+);/g, (_, codePoint) => String.fromCodePoint(Number(codePoint)))
		.replace(/&#x([a-f0-9]+);/gi, (_, hexCodePoint) => String.fromCodePoint(parseInt(hexCodePoint, 16)));

Conversation

harbourviewcompany-create commented May 20, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Description

Testing

Uh oh!

chatgpt-codex-connector Bot commented May 20, 2026

Uh oh!

netlify Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for wurx-can ready!

Uh oh!

coderabbitai Bot commented May 20, 2026

Rate limit exceeded

Uh oh!

netlify Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for wurx-otta ready!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

harbourviewcompany-create commented May 20, 2026 •

edited by devin-ai-integration Bot

Loading

netlify Bot commented May 20, 2026 •

edited

Loading

netlify Bot commented May 20, 2026 •

edited

Loading