Skip to content

Conversation

@Naveed8951
Copy link

Description

This PR fixes a memory-safety issue in the RE2 Python extension (_re2)
where user-controlled offsets could cause out-of-bounds reads and crash
the hosting process.

The helper functions that convert between character offsets and byte
offsets performed unchecked pointer arithmetic
(text.data() + pos / text.data() + endpos) and then dereferenced the
result. If a caller supplied a negative or out-of-range offset, the code
could read outside the buffer bounds, resulting in a native crash.

Changes

  • Added strict validation in the Python binding helpers:
    • Require a 1D, C-contiguous, byte-addressed buffer (itemsize == 1)
    • Validate pos, endpos, and len ranges before pointer arithmetic
    • Fail safely by raising the module’s Python exception instead of
      performing unsafe memory access
  • Added regression tests covering invalid offsets

Impact

  • Prevents out-of-bounds reads at the Python/C++ boundary
  • Converts a crash condition into safe, predictable Python exceptions
  • No behavior change for valid inputs

Testing

  • Added unit tests asserting that invalid offsets raise exceptions:
    • negative pos
    • negative len
    • pos > endpos
    • endpos out of range

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant