Skip to content

Range-check rc_dup_str length to prevent silent int truncation#34

Open
billdenney wants to merge 2 commits into
mainfrom
fix/mem-rc-dup-str
Open

Range-check rc_dup_str length to prevent silent int truncation#34
billdenney wants to merge 2 commits into
mainfrom
fix/mem-rc-dup-str

Conversation

@billdenney
Copy link
Copy Markdown
Contributor

rc_dup_str (src/shared.c) computed the segment length as
int l = e ? e-s : (int)strlen(s);
where the source pointers come from the dparser parse tree. When the source span exceeds INT_MAX bytes, the implicit ptrdiff_t-to-int or size_t-to-int cast silently truncates the length to a wrong value (often negative). Downstream addLine(&_dupStrs, "%.*s", l, s) then either reads past the buffer or copies the wrong number of bytes.

Replace the silent cast with an explicit range check on both branches (ptrdiff_t > INT_MAX and size_t > INT_MAX) and raise a clean R error when out of range. Also add a thread-safety comment documenting that the file-scope parser globals are intentionally not mutex-protected because R's interpreter is single-threaded.

Adds tests/testthat/test-mem-rc-dup-str.R as a regression test. The INT_MAX-input test is skip()ed because it requires ~2 GB of free RAM.

`rc_dup_str` (`src/shared.c`) computed the segment length as
    int l = e ? e-s : (int)strlen(s);
where the source pointers come from the dparser parse tree.  When the
source span exceeds `INT_MAX` bytes, the implicit `ptrdiff_t`-to-`int`
or `size_t`-to-`int` cast silently truncates the length to a wrong
value (often negative).  Downstream `addLine(&_dupStrs, "%.*s", l, s)`
then either reads past the buffer or copies the wrong number of bytes.

Replace the silent cast with an explicit range check on both branches
(`ptrdiff_t > INT_MAX` and `size_t > INT_MAX`) and raise a clean R
error when out of range.  Also add a thread-safety comment documenting
that the file-scope parser globals are intentionally not
mutex-protected because R's interpreter is single-threaded.

Adds tests/testthat/test-mem-rc-dup-str.R as a regression test.  The
INT_MAX-input test is `skip()`ed because it requires ~2 GB of free RAM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@billdenney billdenney force-pushed the fix/mem-rc-dup-str branch from 4cefd28 to e30fb16 Compare May 1, 2026 23:00
@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

❌ Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.44%. Comparing base (ca1ffea) to head (e68fe97).

Files with missing lines Patch % Lines
src/shared.c 77.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #34      +/-   ##
==========================================
- Coverage   89.44%   89.44%   -0.01%     
==========================================
  Files          60       60              
  Lines        6360     6368       +8     
==========================================
+ Hits         5689     5696       +7     
- Misses        671      672       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant