Skip to content

Range-check dparse input length to prevent (int)strlen truncation#36

Open
billdenney wants to merge 3 commits into
mainfrom
fix/mem-dparse-int-cast
Open

Range-check dparse input length to prevent (int)strlen truncation#36
billdenney wants to merge 3 commits into
mainfrom
fix/mem-dparse-int-cast

Conversation

@billdenney
Copy link
Copy Markdown
Contributor

All 13 trans_* parser entry-points (dataSettings, equation, longDef, longOutput, mlxtranContent, mlxtranFileinfo, mlxtranFit, mlxtranInd, mlxtranIndDefinition, mlxtranOp, mlxtranParameter, mlxtranTask, summaryData) feed the input buffer to dparser via
_pn = dparse(curP, gBuf, (int)strlen(gBuf));
When gBuf is longer than INT_MAX bytes the (int) cast silently truncates to a wrong (often negative) length and dparser then reads past the buffer.

Wrap the call so the strlen result is computed in size_t, checked against (size_t)INT_MAX, and only then cast to int. Out-of-range input gets a clean R error instead.

Adds tests/testthat/test-mem-dparse-int-cast.R as a regression test. The boundary test is skip()ed because it needs ~3 GB of free RAM (R's own CHARSXP is capped at INT_MAX bytes so direct R-level exploitation is impractical; the guard primarily protects future C-level callers that bypass that cap).

@billdenney billdenney force-pushed the fix/mem-dparse-int-cast branch from af47a46 to 158516e Compare May 1, 2026 23:01
Each of the 13 `trans_*` parser entry-points (dataSettings, equation,
longDef, longOutput, mlxtranContent, mlxtranFileinfo, mlxtranFit,
mlxtranInd, mlxtranIndDefinition, mlxtranOp, mlxtranParameter,
mlxtranTask, summaryData) previously called
    _pn = dparse(curP, gBuf, (int)strlen(gBuf));
which silently truncated `strlen` to a wrong (often negative) value
when the input exceeded INT_MAX bytes; dparser then read past the
buffer.

Switch each call site to the new `udparse(D_Parser*, char*, unsigned int)`
entry introduced in dparser 1.3.2 and let the unsigned-int parameter
carry the length without any cast or inline guard:

    _pn = udparse(curP, gBuf, (unsigned int)strlen(gBuf));

This is simpler than the per-call-site INT_MAX guard previously
attempted on this branch, because the safety contract is now part of
the dparser API rather than something every caller must remember to
check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@billdenney billdenney force-pushed the fix/mem-dparse-int-cast branch from 158516e to c40a2b1 Compare May 2, 2026 01:29
billdenney and others added 2 commits May 2, 2026 13:22
udparse() is not yet exported by the CRAN dparser-R package, so the
previous commit that switched to udparse() caused an undefined symbol
at load time on any system without a pre-release dparser-R build.

Per project decision, the fix for the (int)strlen truncation belongs in
dparser-R itself.  This commit reverts all 13 trans_* entry-points to
plain dparse() and adds a TODO comment pointing to the long-term udparse
migration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.46%. Comparing base (ca1ffea) to head (00256ea).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #36      +/-   ##
==========================================
+ Coverage   89.44%   89.46%   +0.01%     
==========================================
  Files          60       60              
  Lines        6360     6360              
==========================================
+ Hits         5689     5690       +1     
+ Misses        671      670       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant