fix: add integer overflow and memory safety guards to C parser layer#32
Open
billdenney wants to merge 5 commits into
Open
fix: add integer overflow and memory safety guards to C parser layer#32billdenney wants to merge 5 commits into
billdenney wants to merge 5 commits into
Conversation
- rc_dup_str (shared.c): make ptrdiff_t→int narrowing explicit; add bounds check so diff<0 or diff>INT_MAX calls Rf_error() instead of silently truncating. Add thread-safety comment documenting that the global parser state is intentionally not mutex-protected (R is single-threaded). - sbuf.c: add signed-integer overflow guards before each size-arithmetic expression in sAppendN, sAppend, and addLine (both the string-buffer and line-pointer-array growth paths), preventing R_Realloc from receiving a negative or near-zero size. - parseSyntaxErrors.h (getLine): change col accumulator from int to size_t and add an explicit INT_MAX guard, preventing R_Calloc from receiving a wrapped-negative size when a source line exceeds INT_MAX bytes. - All 13 trans_*() parser entry-points: check strlen(gBuf)>INT_MAX before casting to int for the dparse() length argument; R strings are capped at INT_MAX-1 bytes by R itself, so this guard protects against direct C-level misuse. - tests/testthat/test-memory-safety.R: two regression tests (no skip) confirming normal-sized inputs still parse correctly; three skip() tests documenting the >2 GB boundary conditions with strrep()-based allocations (avoiding paste0+rep() intermediate vectors that caused memory exhaustion). - NEWS.md: add v0.0.7 entry. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
Author
|
I'm breaking this into smaller parts for easier review. I intend to close this when the separated PRs are generated. |
# Conflicts: # tests/testthat/_snaps/data-import/multiple-endpoint-theo-p1.svg # tests/testthat/_snaps/data-import/multiple-endpoint-theo-pall.svg # tests/testthat/_snaps/data-import/multiple-endpoint-theo.svg # tests/testthat/_snaps/data-import/single-endpoint-theo-pall.svg
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
rc_dup_str (shared.c): make ptrdiff_t→int narrowing explicit; add bounds check so diff<0 or diff>INT_MAX calls Rf_error() instead of silently truncating. Add thread-safety comment documenting that the global parser state is intentionally not mutex-protected (R is single-threaded).
sbuf.c: add signed-integer overflow guards before each size-arithmetic expression in sAppendN, sAppend, and addLine (both the string-buffer and line-pointer-array growth paths), preventing R_Realloc from receiving a negative or near-zero size.
parseSyntaxErrors.h (getLine): change col accumulator from int to size_t and add an explicit INT_MAX guard, preventing R_Calloc from receiving a wrapped-negative size when a source line exceeds INT_MAX bytes.
All 13 trans_*() parser entry-points: check strlen(gBuf)>INT_MAX before casting to int for the dparse() length argument; R strings are capped at INT_MAX-1 bytes by R itself, so this guard protects against direct C-level misuse.
tests/testthat/test-memory-safety.R: two regression tests (no skip) confirming normal-sized inputs still parse correctly; three skip() tests documenting the >2 GB boundary conditions with strrep()-based allocations (avoiding paste0+rep() intermediate vectors that caused memory exhaustion).
NEWS.md: add v0.0.7 entry.