Potential fix for code scanning alert no. 1: Incomplete multi-character sanitization#4
Conversation
…er sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Summary of ChangesHello @brianhexer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a crucial security enhancement by refining the process of extracting plain text from HTML content. It transitions from an ad-hoc, potentially insecure regular expression method to a more secure and robust approach utilizing standard DOM APIs. This change specifically targets the translation functionality, ensuring that only the intended plain text is processed, thereby mitigating risks associated with improper HTML sanitization and improving the overall integrity of the application's text handling. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the 'Incomplete multi-character sanitization' code scanning alert by replacing a fragile regex-based HTML stripping method with a much safer DOM-based approach. This is a great security improvement. I've added one suggestion to make the implementation even more robust by using DOMParser, which is the modern best practice for safely parsing HTML from strings. Overall, this is a solid fix.
There was a problem hiding this comment.
Pull request overview
This PR attempts to fix a code scanning alert about incomplete multi-character HTML sanitization by replacing a regex-based approach (replace(/<[^>]*>/g, '')) with DOM-based text extraction. The change aims to more robustly extract plain text from stored HTML before sending it for translation.
Changes:
- Replaced regex HTML tag stripping with DOM-based text extraction using a temporary div element
- Modified text extraction logic in the font change event handler to use
textContentinstead of regex
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Potential fix for https://github.com/brianhexer/handwritten-prints/security/code-scanning/1
In general, the problem should be fixed by avoiding ad‑hoc multi-character regex sanitization of HTML and instead using a well-defined, robust method to obtain plain text from the stored HTML. In the browser, the best simple approach is to parse the HTML into a detached DOM node and read
.textContent, which inherently ignores tags without relying on regexes, and does not reinsert the data as HTML.Concretely, for this code, we only use
paper.dataset.originalHtmlto get the original plain text that should be translated. Instead ofpaper.dataset.originalHtml.replace(/<[^>]*>/g, '').trim();, we can create a temporary<div>, assign itsinnerHTMLto the stored value, and then read.textContent. This removes the multi-character regex sanitization entirely while preserving behavior (we still get the same visible text, but more robustly). No changes to imports are needed; this uses standard DOM APIs available in the browser. The only change required is in js/app.mjs lines 72–74, replacing thereplace(...).trim()call with DOM-based text extraction and trim.Suggested fixes powered by Copilot Autofix. Review carefully before merging.