Skip to content

Add tags parameter to clean_text#122

Merged
messense merged 2 commits into
messense:mainfrom
gghez:add-tags-to-clean-text
Apr 25, 2026
Merged

Add tags parameter to clean_text#122
messense merged 2 commits into
messense:mainfrom
gghez:add-tags-to-clean-text

Conversation

@gghez
Copy link
Copy Markdown
Contributor

@gghez gghez commented Mar 28, 2026

`clean_text` currently escapes everything with no way to let specific
tags through. When migrating from bleach, a common pattern is sanitizing
text while keeping a small set of custom elements like `` intact.

When `tags` is given, those tags are passed through with no attributes and
everything else is stripped. Omitting `tags` keeps the existing behavior.

Closes #54

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional tags parameter to nh3.clean_text() to support a “strip all tags except these” migration path (e.g., from bleach), while keeping existing “escape everything” behavior when tags is omitted.

Changes:

  • Extend clean_text() Python API to accept tags and route to an ammonia cleaner when provided.
  • Add pytest coverage for basic tag preservation/stripping behavior in clean_text(tags=...).
  • Update Python type stubs (nh3.pyi) to reflect the new optional parameter.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/lib.rs Implements clean_text(html, tags=None) and updates docstring/example.
tests/test_nh3.py Adds new tests asserting preserved tags pass through and others are stripped.
nh3.pyi Updates the stub signature for clean_text to include optional tags.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test_nh3.py
Comment thread src/lib.rs
Override the Config default `link_rel` to None so ammonia does not inject
a `rel` attribute on <a> when tags includes "a". Add a regression test
that asserts `<a href=... rel=...>` is passed through as `<a>`.
@gghez gghez force-pushed the add-tags-to-clean-text branch from 88fd9c5 to bd459ff Compare April 24, 2026 21:04
@messense messense merged commit 5225ec2 into messense:main Apr 25, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add "tags" option to clean_text

3 participants