Skip to content

Potential fix for code scanning alert no. 1: Incomplete multi-character sanitization#4

Merged
brianhexer merged 1 commit into
mainfrom
alert-autofix-1
Jan 11, 2026
Merged

Potential fix for code scanning alert no. 1: Incomplete multi-character sanitization#4
brianhexer merged 1 commit into
mainfrom
alert-autofix-1

Conversation

@brianhexer

Copy link
Copy Markdown
Owner

Potential fix for https://github.com/brianhexer/handwritten-prints/security/code-scanning/1

In general, the problem should be fixed by avoiding ad‑hoc multi-character regex sanitization of HTML and instead using a well-defined, robust method to obtain plain text from the stored HTML. In the browser, the best simple approach is to parse the HTML into a detached DOM node and read .textContent, which inherently ignores tags without relying on regexes, and does not reinsert the data as HTML.

Concretely, for this code, we only use paper.dataset.originalHtml to get the original plain text that should be translated. Instead of paper.dataset.originalHtml.replace(/<[^>]*>/g, '').trim();, we can create a temporary <div>, assign its innerHTML to the stored value, and then read .textContent. This removes the multi-character regex sanitization entirely while preserving behavior (we still get the same visible text, but more robustly). No changes to imports are needed; this uses standard DOM APIs available in the browser. The only change required is in js/app.mjs lines 72–74, replacing the replace(...).trim() call with DOM-based text extraction and trim.

Suggested fixes powered by Copilot Autofix. Review carefully before merging.

…er sanitization

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello @brianhexer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a crucial security enhancement by refining the process of extracting plain text from HTML content. It transitions from an ad-hoc, potentially insecure regular expression method to a more secure and robust approach utilizing standard DOM APIs. This change specifically targets the translation functionality, ensuring that only the intended plain text is processed, thereby mitigating risks associated with improper HTML sanitization and improving the overall integrity of the application's text handling.

Highlights

  • Security Fix: Replaced a regex-based HTML tag removal method with a more robust DOM-based approach to extract plain text from paper.dataset.originalHtml. This addresses a code scanning alert related to incomplete multi-character sanitization.
  • Improved HTML Sanitization: The change now uses document.createElement('div'), sets its innerHTML to the original HTML, and then reads textContent (with a fallback to innerText) to safely obtain plain text, avoiding potential vulnerabilities of regex-based stripping.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses the 'Incomplete multi-character sanitization' code scanning alert by replacing a fragile regex-based HTML stripping method with a much safer DOM-based approach. This is a great security improvement. I've added one suggestion to make the implementation even more robust by using DOMParser, which is the modern best practice for safely parsing HTML from strings. Overall, this is a solid fix.

Comment thread js/app.mjs
@brianhexer brianhexer marked this pull request as ready for review January 11, 2026 00:08
Copilot AI review requested due to automatic review settings January 11, 2026 00:08
@brianhexer brianhexer merged commit a01e8b9 into main Jan 11, 2026
6 of 8 checks passed

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR attempts to fix a code scanning alert about incomplete multi-character HTML sanitization by replacing a regex-based approach (replace(/<[^>]*>/g, '')) with DOM-based text extraction. The change aims to more robustly extract plain text from stored HTML before sending it for translation.

Changes:

  • Replaced regex HTML tag stripping with DOM-based text extraction using a temporary div element
  • Modified text extraction logic in the font change event handler to use textContent instead of regex

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread js/app.mjs
Comment thread js/app.mjs
@brianhexer brianhexer deleted the alert-autofix-1 branch May 4, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants