markdown-security

A small HTTP microservice that validates and sanitizes Markdown payloads against an HTML-tag allowlist. It is meant to sit between an untrusted producer (form, CMS, API caller) and any consumer that will render Markdown as HTML, so the consumer can rely on the body being free of script-bearing or otherwise dangerous tags.

The service exposes a validation endpoint backed by sanitize-html, plus a liveness probe. It does not render Markdown to HTML; it inspects raw Markdown for embedded HTML, strips anything outside the allowlist, and tells the caller whether the input was modified.

How it works

POST /validate accepts JSON of the form { "markdown": "..." } and returns:

{
  "safe": true,
  "message": "Markdown is safe",
  "sanitized": "...",
  "frontMatter": null
}

safe is true only if sanitization made no changes to the body and no HTML-like content was detected in the front matter. Any disallowed tag, attribute, or URL scheme in either part will flip it to false.
sanitized contains only the Markdown body, post-sanitization. It never contains the front-matter block.
frontMatter is the raw YAML between the --- markers, or null if no front matter was present. It is returned untouched — see the front-matter section below.
message is a human-readable summary.

The status code is 200 for any request that conforms to the published JSON Schema, and 400 when the body fails schema validation (missing markdown, wrong type, unexpected fields, etc.). 400 responses include a details array listing each per-field violation.

GET /health returns 200 with { "status": "ok" }. It is intended for liveness probes (Docker HEALTHCHECK, Kubernetes, load balancers) and does not exercise the sanitizer.

GET /openapi.json returns the OpenAPI 3.1 specification of this service, including request/response schemas, the documented headers, and all status codes. Point a Swagger UI / Postman / Stoplight at it to explore or generate clients.

Allowlist

Allowed tags: headings h1-h6, paragraphs and breaks (p, br, hr), lists (ul, ol, li, dl, dt, dd), text emphasis (strong, em, u, s, b, i, mark, sub, sup), code blocks (pre, code, kbd, samp), tables (table, thead, tbody, tr, td, th), blockquotes, images (img), and links (a).

Allowed attributes:

a: href, title, target
img: src, alt, width, height
code: class

Allowed URL schemes for hrefs and image sources: http, https, mailto. Anything else (including javascript:, data:, vbscript:) is dropped.

YAML front matter

A leading YAML block of the form ---\n...\n---\n is detected and exposed in a separate frontMatter field. The block contents are not run through sanitize-html — they are returned to the caller raw. The reason is that YAML is a data format, not a display format, and trying to sanitize it as HTML produces false positives on legitimate values.

What the service does check: if the front-matter content contains an HTML-like token (< immediately followed by a letter, !, or /), safe is set to false. That covers the realistic threat model — an attacker smuggling <script> or <iframe> past the sanitizer by hiding it in metadata. It does not catch every possible misuse, so:

If you intend to render any front-matter value as HTML, sanitize it on the consumer side. Treat frontMatter as untrusted input.

Quickstart

npm install
npm start              # listens on http://localhost:5001
npm test               # runs the Jest suite

curl -s -X POST http://localhost:5001/validate \
  -H 'content-type: application/json' \
  -d '{"markdown":"# Hello\n\n<script>alert(1)</script>"}'

{
  "safe": false,
  "message": "Markdown contains unsafe content",
  "sanitized": "# Hello\n\n",
  "frontMatter": null
}

Docker

docker build -t markdown-security .
docker run --rm -p 5001:5001 markdown-security

The image is built on node:24-alpine, runs as the unprivileged node user, and ships a HEALTHCHECK that hits /health. The bundled .dockerignore keeps .git, .env, tests and CI artefacts out of the image.

Configuration

Env var	Default	Description
`PORT`	`5001`	TCP port the HTTP server binds to.
`LOG_LEVEL`	`info`	`pino` log level (`trace`, `debug`, `info`, `warn`, `error`, `fatal`, `silent`). Forced to `silent` under `NODE_ENV=test`.
`ALLOWLIST_FILE`	unset	Path to a JSON file with a custom `sanitize-html` configuration. When set, replaces the built-in allowlist wholesale. See Customising the allowlist.
`RATE_LIMIT_RPM`	unset	Positive integer. When set, enables per-IP rate limiting on `POST /validate` at this many requests per minute. Disabled by default. See Rate limiting.

The JSON body limit is fixed at 256kb. Markdown larger than that is rejected by Express with a 413 before reaching the handler. Adjust express.json({ limit: ... }) in server.js if you need more.

Customising the allowlist

Set ALLOWLIST_FILE to a JSON file whose contents are passed straight to sanitize-html. Useful when different consumers need different policies (e.g. a strict subset for user-generated content, a relaxed superset for trusted authoring tools).

{
  "allowedTags": ["p", "em", "strong", "a"],
  "allowedAttributes": { "a": ["href"] },
  "allowedSchemes": ["https"],
  "disallowedTagsMode": "discard"
}

The file is loaded once at startup. Malformed JSON, a missing file, or a non-array allowedTags causes the process to exit immediately rather than silently fall back. The default allowlist lives in lib/allowlist.js and is exported as DEFAULT_ALLOWLIST for reference.

Rate limiting

Set RATE_LIMIT_RPM to a positive integer to enable per-IP rate limiting on POST /validate. The window is 60 seconds and the limit applies only to /validate — /health and /openapi.json are always reachable so that probes and clients can introspect the service even under load. Exceeding the limit returns 429 Too Many Requests with retry-after and ratelimit-* headers (RFC 9462).

The limiter keys on req.ip. If the service is deployed behind a reverse proxy, configure app.set('trust proxy', ...) in server.js so the limiter sees the real client address rather than the proxy. The service ships with no trust proxy configuration to avoid header-injection in untrusted topologies.

Invalid values (0, negative, non-integer) cause the process to exit at startup rather than silently disable.

Logging and request correlation

Every request is logged as a single JSON line on stdout via pino-http. Each request is tagged with an id surfaced in the x-request-id response header and included in every log line. If the caller sends an x-request-id header that matches ^[a-zA-Z0-9_.-]{1,128}$, the service reuses it; otherwise a fresh UUID is generated. Use this id to correlate a client trace with the server log for a given request.

Security notes

Allowlist, not denylist. New tags are blocked by default. To extend the surface, edit the allowedTags / allowedAttributes arrays in server.js and add a regression test.
Front matter is exposed raw, not trusted. It is returned in its own frontMatter field, never inside sanitized. A coarse HTML-like check decides safe, but the consumer must sanitize any front-matter value it intends to render as HTML.
query parser is set to simple. Express's default qs-based parser has shipped two array-limit DoS bypasses (GHSA-w7fw-mjwx-w883, GHSA-6rw7-vpxm-498p); the simple parser is not affected. Do not change this without re-reviewing those advisories.
Body size cap. express.json({ limit: '256kb' }) is the first line of defence against payload-amplification attacks against sanitize-html.
Rate limiting is opt-in via RATE_LIMIT_RPM and disabled by default. The service still expects to live behind a gateway for auth and TLS; the in-process limiter is defence-in-depth for /validate, not a substitute for an upstream policy layer.
Property-based fuzzing. tests/fuzzing.test.js runs fast-check against /validate to exercise invariants (no dangerous tags ever leak to sanitized, sanitization is idempotent, front matter never appears inside sanitized). Hundreds of randomized payloads per release.
Schema-validated boundary. POST /validate rejects any body that does not conform to the OpenAPI ValidateRequest schema (ajv). Extra fields, wrong types, and missing/empty values are caught before reaching the sanitizer, with structured details per violation.
SBOM attached to every release. A CycloneDX 1.6 JSON Software Bill of Materials (sbom.cdx.json) is generated from the production lockfile and uploaded as a release asset by .github/workflows/sbom.yml. Run npm run sbom to produce one locally.

npm audit reports zero vulnerabilities at the time of writing (May 2026, against express@5, sanitize-html@2.17, pino@10, pino-http@11, ajv@8, express-rate-limit@8, jest@30, supertest@7.2, fast-check@4).

Project layout

server.js                 Express app + /validate, /health and /openapi.json handlers.
lib/allowlist.js          Default sanitize-html allowlist and ALLOWLIST_FILE loader.
openapi.json              OpenAPI 3.1 contract served by /openapi.json.
tests/validation.test.js  Jest + Supertest suite covering happy path and rejection cases.
tests/fuzzing.test.js     Property-based tests (fast-check) for sanitizer invariants.
tests/request-id.test.js  Coverage for the x-request-id middleware.
tests/openapi.test.js     Coverage for the OpenAPI endpoint and contract.
tests/allowlist.test.js   Unit + integration coverage for the allowlist loader.
tests/rate-limit.test.js  Coverage for the RATE_LIMIT_RPM middleware on /validate.
Dockerfile, .dockerignore Container build.

Changelog

See CHANGELOG.md for the version history.

License

MIT - see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markdown-security

How it works

Allowlist

YAML front matter

Quickstart

Docker

Configuration

Customising the allowlist

Rate limiting

Logging and request correlation

Security notes

Project layout

Changelog

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
lib		lib
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
openapi.json		openapi.json
package-lock.json		package-lock.json
package.json		package.json
server.js		server.js

Folders and files

Latest commit

History

Repository files navigation

markdown-security

How it works

Allowlist

YAML front matter

Quickstart

Docker

Configuration

Customising the allowlist

Rate limiting

Logging and request correlation

Security notes

Project layout

Changelog

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages