Range-check rc_dup_str length to prevent silent int truncation#34
Open
billdenney wants to merge 2 commits into
Open
Range-check rc_dup_str length to prevent silent int truncation#34billdenney wants to merge 2 commits into
billdenney wants to merge 2 commits into
Conversation
`rc_dup_str` (`src/shared.c`) computed the segment length as
int l = e ? e-s : (int)strlen(s);
where the source pointers come from the dparser parse tree. When the
source span exceeds `INT_MAX` bytes, the implicit `ptrdiff_t`-to-`int`
or `size_t`-to-`int` cast silently truncates the length to a wrong
value (often negative). Downstream `addLine(&_dupStrs, "%.*s", l, s)`
then either reads past the buffer or copies the wrong number of bytes.
Replace the silent cast with an explicit range check on both branches
(`ptrdiff_t > INT_MAX` and `size_t > INT_MAX`) and raise a clean R
error when out of range. Also add a thread-safety comment documenting
that the file-scope parser globals are intentionally not
mutex-protected because R's interpreter is single-threaded.
Adds tests/testthat/test-mem-rc-dup-str.R as a regression test. The
INT_MAX-input test is `skip()`ed because it requires ~2 GB of free RAM.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4cefd28 to
e30fb16
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #34 +/- ##
==========================================
- Coverage 89.44% 89.44% -0.01%
==========================================
Files 60 60
Lines 6360 6368 +8
==========================================
+ Hits 5689 5696 +7
- Misses 671 672 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
rc_dup_str(src/shared.c) computed the segment length asint l = e ? e-s : (int)strlen(s);
where the source pointers come from the dparser parse tree. When the source span exceeds
INT_MAXbytes, the implicitptrdiff_t-to-intorsize_t-to-intcast silently truncates the length to a wrong value (often negative). DownstreamaddLine(&_dupStrs, "%.*s", l, s)then either reads past the buffer or copies the wrong number of bytes.Replace the silent cast with an explicit range check on both branches (
ptrdiff_t > INT_MAXandsize_t > INT_MAX) and raise a clean R error when out of range. Also add a thread-safety comment documenting that the file-scope parser globals are intentionally not mutex-protected because R's interpreter is single-threaded.Adds tests/testthat/test-mem-rc-dup-str.R as a regression test. The INT_MAX-input test is
skip()ed because it requires ~2 GB of free RAM.