Skip to content

feat: harden body HTML sanitization and PDF remote resource blocking#77

Merged
ShadyUnderLight merged 1 commit into
mainfrom
feat/html-sanitization-body
May 11, 2026
Merged

feat: harden body HTML sanitization and PDF remote resource blocking#77
ShadyUnderLight merged 1 commit into
mainfrom
feat/html-sanitization-body

Conversation

@ShadyUnderLight
Copy link
Copy Markdown
Owner

@ShadyUnderLight ShadyUnderLight commented May 11, 2026

Summary

完成 #68 剩余 scope:body HTML sanitization + PDF HTTP/HTTPS 资源默认阻断。

改动

文件 改动
scripts/markdown_to_html.py sanitize_html()nh3 做 allowlist 过滤:移除 <script><iframe>onerror=javascript: URL、style 属性、<img>;只允许安全标签和属性
scripts/render_pdf.py 新增 --allow-remote(opt-out,默认阻断 HTTP/HTTPS);路由用 **/* + scheme 判断
scripts/md_to_pdf.py --allow-remote 透传(opt-out,默认阻断)
requirements.txt 新增 nh3>=0.2
.github/workflows/ci.yml 测试覆盖:metadata escaping、script/iframe/onerror purge、inline style 移除、img 移除、remote-blocked PDF 生成

安全模型

  • metadata: html.escape() — 4 个字段
  • body HTML: nh3 sanitization — styleimg 移除,dangerous tags stripped
  • PDF 渲染: 默认阻断 HTTP/HTTPS 远程资源,--allow-remote 显式放开

Closes #68

@ShadyUnderLight ShadyUnderLight force-pushed the feat/html-sanitization-body branch 3 times, most recently from dc07c5b to 360150a Compare May 11, 2026 03:59
Complete the remaining scope from issue #68:

- scripts/markdown_to_html.py: add sanitize_html() using nh3 with a
  strict allowlist of safe tags and attributes. Strips script, iframe,
  event handlers, inline style, img, and javascript: URLs.
  Integration runs after all post-processing.

- scripts/render_pdf.py: add --allow-remote flag (opt-out, HTTP/HTTPS
  BLOCKED by default). Route uses **/* with scheme check, covering
  all http/https URLs including those with paths.

- scripts/md_to_pdf.py: add --allow-remote passthrough (opt-out).

- requirements.txt: add nh3>=0.2

- scripts/test_remote_block.py: standalone test that spawns local
  HTTP server, verifies default mode blocks all remote requests and
  --allow-remote permits them.

- CI: expand tests to cover style stripping, img removal, and
  remote-blocked PDF generation.

Closes #68
@ShadyUnderLight ShadyUnderLight force-pushed the feat/html-sanitization-body branch from 360150a to 0570659 Compare May 11, 2026 04:06
@ShadyUnderLight ShadyUnderLight merged commit ed69395 into main May 11, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

明确 markdown_to_html.py 的 HTML 安全边界,避免不可信 Markdown 注入到交付 HTML

1 participant