Skip to content

Fix parser out-of-bounds reads#131

Open
laurikari wants to merge 4 commits intomasterfrom
fix/parser-oob
Open

Fix parser out-of-bounds reads#131
laurikari wants to merge 4 commits intomasterfrom
fix/parser-oob

Conversation

@laurikari
Copy link
Copy Markdown
Owner

Updated retest so regncomp is called with a strictly non‑padded input buffer, rather than passing a NUL terminated buffer. With that in place, AddressSanitizer exposed several out‑of‑bounds reads in the parser.

Added a few additional parser test cases around atoms, bracket expressions, and escapes.

Then fixed the parser, adding explicit bounds checks and better error handling.

By passing exact-length unpadded inputs to regncomp, we can catch out-of-bounds
reads using -fsanitize=address.
Many tools expect source files to have some valid encoding (like UTF-8), this
makes them happy.
These were all found with the latest retest.c using the Address Sanitizer with:

  CFLAGS=-O1 -g -fsanitize=address -fno-omit-frame-pointer
  LDFLAGS=-fsanitize=address
@laurikari laurikari requested a review from dag-erling January 31, 2026 22:07
Comment on lines +1244 to +1245
if (ctx->re >= ctx->re_end)
return REG_BADPAT;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant since we perform the exact same check immediately upon entering the loop.


#ifdef REG_LITERAL
if (*(ctx->re + 1) == L'Q')
if (*(ctx->re + 1) == L'Q' || *(ctx->re + 1) == L'U')
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated... What is \U supposed to mean? Also, \Q...\E appears to be undocumented. Also also, the TRE website is down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants