Skip to content

Guard against overlapping-quantifier ReDoS in subparser regexes #1050

@tivie

Description

@tivie

Summary

A manual security audit of the 922-commonmark-compliance branch found a quadratic-complexity (O(n²)) ReDoS in the setext-heading subparser. It is fixed on that branch (commit ce94570) and is not present in released master/develop (they use the older, linear /^(.+)[ \t]*\n=+[ \t]*\n+/gm form). This issue documents the pattern class so the same mistake is caught in future regex work.

The bug

src/subParsers/makehtml/heading.js matched a setext heading's first line as:

/^( {0,3}([^ \t\n]+.*\n)(.+\n)?(.+\n)?)( {0,3}=+[ \t]*)$/gm

The two greedy quantifiers [^ \t\n]+ and .* can consume the same characters, so a long whitespace-free line with no trailing =/- underline forces O(n²) backtracking. This runs in the default converter, so any long unbroken token (URL, base64 blob, hash) in user input triggers it:

input before after
a×80 000 ~7.7 s ~4 ms
160k-char line (worse) ~18 ms

A ~200 KB field blocks the Node event loop for tens of seconds — a single-request DoS for any service rendering untrusted Markdown.

The fix

[^ \t\n]+.* and [^ \t\n].* match the same set of lines and capture the same substring, but the latter has no nested quantifier:

-/^( {0,3}([^ \t\n]+.*\n)(.+\n)?(.+\n)?)( {0,3}=+[ \t]*)$/gm
+/^( {0,3}([^ \t\n].*\n)(.+\n)?(.+\n)?)( {0,3}=+[ \t]*)$/gm

A 50 000-char single-line regression fixture was added under test/functional/makehtml/cases/issues/; the old quadratic behaviour exceeds mocha's 2 s default timeout, guarding against reintroduction.

Guidance for future regex changes

  • Avoid adjacent/overlapping quantifiers that can match the same input (X+.*, (a+)+, .*.*). Prefer a single quantifier or a fixed-length lead-in ([^ \t\n].*).
  • Be wary of optional multi-line lookback ((.+\n)?(.+\n)?) combined with a required right-anchor.
  • When adding/altering a block-level regex, benchmark a long whitespace-free line ('a'.repeat(80000)); near-instant = OK, seconds = backtracking. Doubling input and seeing ~4× time means O(n²).
  • These patterns are caught by neither npm audit nor CodeQL's current rules — they need manual review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions