Range-check dparse input length to prevent (int)strlen truncation#36
Open
billdenney wants to merge 3 commits into
Open
Range-check dparse input length to prevent (int)strlen truncation#36billdenney wants to merge 3 commits into
billdenney wants to merge 3 commits into
Conversation
af47a46 to
158516e
Compare
Each of the 13 `trans_*` parser entry-points (dataSettings, equation,
longDef, longOutput, mlxtranContent, mlxtranFileinfo, mlxtranFit,
mlxtranInd, mlxtranIndDefinition, mlxtranOp, mlxtranParameter,
mlxtranTask, summaryData) previously called
_pn = dparse(curP, gBuf, (int)strlen(gBuf));
which silently truncated `strlen` to a wrong (often negative) value
when the input exceeded INT_MAX bytes; dparser then read past the
buffer.
Switch each call site to the new `udparse(D_Parser*, char*, unsigned int)`
entry introduced in dparser 1.3.2 and let the unsigned-int parameter
carry the length without any cast or inline guard:
_pn = udparse(curP, gBuf, (unsigned int)strlen(gBuf));
This is simpler than the per-call-site INT_MAX guard previously
attempted on this branch, because the safety contract is now part of
the dparser API rather than something every caller must remember to
check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
158516e to
c40a2b1
Compare
udparse() is not yet exported by the CRAN dparser-R package, so the previous commit that switched to udparse() caused an undefined symbol at load time on any system without a pre-release dparser-R build. Per project decision, the fix for the (int)strlen truncation belongs in dparser-R itself. This commit reverts all 13 trans_* entry-points to plain dparse() and adds a TODO comment pointing to the long-term udparse migration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #36 +/- ##
==========================================
+ Coverage 89.44% 89.46% +0.01%
==========================================
Files 60 60
Lines 6360 6360
==========================================
+ Hits 5689 5690 +1
+ Misses 671 670 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
All 13
trans_*parser entry-points (dataSettings, equation, longDef, longOutput, mlxtranContent, mlxtranFileinfo, mlxtranFit, mlxtranInd, mlxtranIndDefinition, mlxtranOp, mlxtranParameter, mlxtranTask, summaryData) feed the input buffer to dparser via_pn = dparse(curP, gBuf, (int)strlen(gBuf));
When
gBufis longer than INT_MAX bytes the(int)cast silently truncates to a wrong (often negative) length and dparser then reads past the buffer.Wrap the call so the
strlenresult is computed insize_t, checked against(size_t)INT_MAX, and only then cast toint. Out-of-range input gets a clean R error instead.Adds tests/testthat/test-mem-dparse-int-cast.R as a regression test. The boundary test is
skip()ed because it needs ~3 GB of free RAM (R's own CHARSXP is capped at INT_MAX bytes so direct R-level exploitation is impractical; the guard primarily protects future C-level callers that bypass that cap).