Skip to content

fix: add integer overflow and memory safety guards to C parser layer#32

Open
billdenney wants to merge 5 commits into
mainfrom
memory-safety-fixes
Open

fix: add integer overflow and memory safety guards to C parser layer#32
billdenney wants to merge 5 commits into
mainfrom
memory-safety-fixes

Conversation

@billdenney
Copy link
Copy Markdown
Contributor

  • rc_dup_str (shared.c): make ptrdiff_t→int narrowing explicit; add bounds check so diff<0 or diff>INT_MAX calls Rf_error() instead of silently truncating. Add thread-safety comment documenting that the global parser state is intentionally not mutex-protected (R is single-threaded).

  • sbuf.c: add signed-integer overflow guards before each size-arithmetic expression in sAppendN, sAppend, and addLine (both the string-buffer and line-pointer-array growth paths), preventing R_Realloc from receiving a negative or near-zero size.

  • parseSyntaxErrors.h (getLine): change col accumulator from int to size_t and add an explicit INT_MAX guard, preventing R_Calloc from receiving a wrapped-negative size when a source line exceeds INT_MAX bytes.

  • All 13 trans_*() parser entry-points: check strlen(gBuf)>INT_MAX before casting to int for the dparse() length argument; R strings are capped at INT_MAX-1 bytes by R itself, so this guard protects against direct C-level misuse.

  • tests/testthat/test-memory-safety.R: two regression tests (no skip) confirming normal-sized inputs still parse correctly; three skip() tests documenting the >2 GB boundary conditions with strrep()-based allocations (avoiding paste0+rep() intermediate vectors that caused memory exhaustion).

  • NEWS.md: add v0.0.7 entry.

billdenney and others added 4 commits April 2, 2026 21:41
- rc_dup_str (shared.c): make ptrdiff_t→int narrowing explicit; add
  bounds check so diff<0 or diff>INT_MAX calls Rf_error() instead of
  silently truncating.  Add thread-safety comment documenting that the
  global parser state is intentionally not mutex-protected (R is
  single-threaded).

- sbuf.c: add signed-integer overflow guards before each size-arithmetic
  expression in sAppendN, sAppend, and addLine (both the string-buffer
  and line-pointer-array growth paths), preventing R_Realloc from
  receiving a negative or near-zero size.

- parseSyntaxErrors.h (getLine): change col accumulator from int to
  size_t and add an explicit INT_MAX guard, preventing R_Calloc from
  receiving a wrapped-negative size when a source line exceeds INT_MAX
  bytes.

- All 13 trans_*() parser entry-points: check strlen(gBuf)>INT_MAX
  before casting to int for the dparse() length argument; R strings are
  capped at INT_MAX-1 bytes by R itself, so this guard protects against
  direct C-level misuse.

- tests/testthat/test-memory-safety.R: two regression tests (no skip)
  confirming normal-sized inputs still parse correctly; three skip()
  tests documenting the >2 GB boundary conditions with strrep()-based
  allocations (avoiding paste0+rep() intermediate vectors that caused
  memory exhaustion).

- NEWS.md: add v0.0.7 entry.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@billdenney
Copy link
Copy Markdown
Contributor Author

I'm breaking this into smaller parts for easier review. I intend to close this when the separated PRs are generated.

# Conflicts:
#	tests/testthat/_snaps/data-import/multiple-endpoint-theo-p1.svg
#	tests/testthat/_snaps/data-import/multiple-endpoint-theo-pall.svg
#	tests/testthat/_snaps/data-import/multiple-endpoint-theo.svg
#	tests/testthat/_snaps/data-import/single-endpoint-theo-pall.svg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants