Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis pull request adds comprehensive tooling and infrastructure for ZSTD, including diagnostic utilities, parallel compression support (pzstd), seekable format handlers, dictionary training modules, benchmarking frameworks, file I/O subsystems, educational decompression reference, legacy format support, and gzip wrapper integration across contrib/, lib/, doc/, programs/, tests/, and zlibWrapper/ directories. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant pzstd as pzstd<br/>Framework
participant SharedState as SharedState<br/>Context
participant CPool as CStream<br/>Pool
participant Executor as ThreadPool<br/>Executor
User->>pzstd: pzstdMain(options)
pzstd->>SharedState: constructor(options)
SharedState->>CPool: createResourcePool()
SharedState->>Executor: initialize logging
pzstd->>Executor: asyncCompressChunks()
loop Per Chunk
Executor->>CPool: acquireResource()
Executor->>Executor: compress via ZSTD_compressStream()
Executor->>CPool: releaseResource()
Executor->>pzstd: push result to queue
end
pzstd->>pzstd: writeFile(output queue)
pzstd->>SharedState: destructor
SharedState->>CPool: free pools
SharedState->>SharedState: cleanup
sequenceDiagram
participant App as Application
participant Seek as Seekable<br/>Compress
participant FrameLog as Frame Log
participant CStream as ZSTD CStream
App->>Seek: createCStream()
Seek->>Seek: allocate context
Seek->>FrameLog: createFrameLog()
App->>Seek: initCStream()
App->>Seek: compressStream(input)
Seek->>CStream: ZSTD_compressStream()
loop Per Frame
Seek->>FrameLog: logFrame(cSize,dSize,checksum)
end
App->>Seek: endStream()
Seek->>Seek: writeSeekTable()
Seek->>Seek: emit seek table footer
App->>Seek: freeCStream()
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
| ZSTD_freeDDict(stuff->ddict); | ||
| free(stuff->dict); | ||
| ZSTD_freeDCtx(stuff->dctx); | ||
| } |
| memset(stuff->error_counts, 0, sizeof(stuff->error_counts)); | ||
|
|
||
| return 1; | ||
| } |
There was a problem hiding this comment.
Bug: Incomplete Cleanup Leaks Memory
Memory leak on error paths in init_stuff. When initialization fails partway through (e.g., after allocating stuff->input and stuff->perturbed but before completing), the function returns 0 without freeing already-allocated resources. The caller in main exits immediately without calling free_stuff, leaking the partially-initialized allocations.
| *dict_id = ZSTD_getDictID_fromDDict(ddict); | ||
| } | ||
| return ddict; | ||
| } |
There was a problem hiding this comment.
Bug: Incomplete Initialization Leaks Resources
Memory leak in readDict when dictionary creation fails. If readFile succeeds but ZSTD_createDDict_advanced fails, the function returns NULL without freeing the buffer allocated by readFile and assigned to *buf. The caller cannot free this memory since it only checks the return value, not the output parameter.
There was a problem hiding this comment.
Actionable comments posted: 35
🧹 Nitpick comments (24)
programs/util.c (1)
99-105: Consider renaming UTIL_realloc to reflect non-standard behavior.UTIL_realloc has non-standard semantics: it frees the original pointer on failure, unlike standard
realloc(). While documented in a comment, this could lead to subtle bugs if developers expect standard realloc behavior.Consider renaming to make the behavior explicit:
-UTIL_STATIC void* UTIL_realloc(void *ptr, size_t size) +UTIL_STATIC void* UTIL_reallocOrFree(void *ptr, size_t size) { void *newptr = realloc(ptr, size); if (newptr) return newptr; free(ptr); return NULL; }Or add a more prominent warning in the function documentation.
contrib/pzstd/utils/Buffer.h (1)
13-13: Remove unused include.The
<array>header is included but not used anywhere in this file.Apply this diff to remove the unused include:
-#include <array> #include <cstddef> #include <memory>contrib/pzstd/Pzstd.h (2)
46-57: Consider logging or propagating errors from the CStream factory.When
ZSTD_initCStream_advancedfails (line 52), the factory lambda returnsnullptrwithout logging the error or settingerrorHolder. This makes it difficult to diagnose why stream creation failed.Apply this diff to add error logging:
auto zcs = ZSTD_createCStream(); if (zcs) { auto err = ZSTD_initCStream_advanced( zcs, nullptr, 0, parameters, 0); if (ZSTD_isError(err)) { + this->log(kLogInfo, "ZSTD_initCStream_advanced failed: %s\n", + ZSTD_getErrorName(err)); ZSTD_freeCStream(zcs); return nullptr; } + } else { + this->log(kLogInfo, "ZSTD_createCStream failed\n"); }
64-74: Consider logging or propagating errors from the DStream factory.Similar to the CStream factory, when
ZSTD_initDStreamfails (line 69), the factory returnsnullptrwithout logging the error. This symmetry issue makes debugging difficult.Apply this diff to add error logging:
auto zds = ZSTD_createDStream(); if (zds) { auto err = ZSTD_initDStream(zds); if (ZSTD_isError(err)) { + this->log(kLogInfo, "ZSTD_initDStream failed: %s\n", + ZSTD_getErrorName(err)); ZSTD_freeDStream(zds); return nullptr; } + } else { + this->log(kLogInfo, "ZSTD_createDStream failed\n"); }programs/benchfn.c (2)
179-179: Useuintptr_tfor pointer arithmetic instead ofsize_t.Casting a pointer to
size_tfor alignment checking is not portable. Useuintptr_t(from<stdint.h>) instead, which is specifically designed for pointer-to-integer conversions.+#include <stdint.h> + BMK_timedFnState_t* BMK_initStatic_timedFnState(void* buffer, size_t size, unsigned total_ms, unsigned run_ms) { typedef char check_size[ 2 * (sizeof(BMK_timedFnState_shell) >= sizeof(struct BMK_timedFnState_s)) - 1]; typedef struct { check_size c; BMK_timedFnState_t tfs; } tfs_align; size_t const tfs_alignment = offsetof(tfs_align, tfs); BMK_timedFnState_t* const r = (BMK_timedFnState_t*)buffer; if (buffer == NULL) return NULL; if (size < sizeof(struct BMK_timedFnState_s)) return NULL; - if ((size_t)buffer % tfs_alignment) return NULL; + if ((uintptr_t)buffer % tfs_alignment) return NULL; BMK_resetTimedFnState(r, total_ms, run_ms); return r; }
192-192: Consider using a named constant for the "infinity" sentinel value.The expression
(double)TIMELOOP_NANOSEC * 2000000000evaluates to approximately 63 years, serving as an effective "maximum possible value" for minimum-time tracking. While functionally correct, a named constant would clarify intent.+#define BMK_TIMELOOP_INFINITY_NS ((double)TIMELOOP_NANOSEC * 2000000000) + void BMK_resetTimedFnState(BMK_timedFnState_t* timedFnState, unsigned total_ms, unsigned run_ms) { if (!total_ms) total_ms = 1 ; if (!run_ms) run_ms = 1; if (run_ms > total_ms) run_ms = total_ms; timedFnState->timeSpent_ns = 0; timedFnState->timeBudget_ns = (PTime)total_ms * TIMELOOP_NANOSEC / 1000; timedFnState->runBudget_ns = (PTime)run_ms * TIMELOOP_NANOSEC / 1000; - timedFnState->fastestRun.nanoSecPerRun = (double)TIMELOOP_NANOSEC * 2000000000; + timedFnState->fastestRun.nanoSecPerRun = BMK_TIMELOOP_INFINITY_NS; timedFnState->fastestRun.sumOfReturn = (size_t)(-1LL); timedFnState->nbLoops = 1; timedFnState->coolTime = UTIL_getTime(); }programs/benchzstd.c (5)
67-71: Consider adding a comment to clarify themaxMemorycalculation.The expression
1ULL << ((sizeof(size_t) * 8) - 31)evaluates to 8 GB on 64-bit systems. While correct, a comment explaining the formula (e.g., "2^33 bytes = 8GB") would improve readability.
471-489: Strengthen overflow and allocation checks.The overflow check at line 474 is good, but the subsequent
malloc(decodedSize)at line 480 doesn't verify that the allocation is reasonable or check formallocfailure before use. While line 481 checks for NULL, consider adding a maximum size sanity check.{ size_t const decodedSize = (size_t)totalDSize64; assert((U64)decodedSize == totalDSize64); /* check overflow */ free(*resultBufferPtr); if (totalDSize64 > decodedSize) { /* size_t overflow */ RETURN_ERROR( 32, BMK_benchOutcome_t, "decompressed size is too large for local system"); } + if (decodedSize > maxMemory) { + RETURN_ERROR( + 32, + BMK_benchOutcome_t, + "decompressed size exceeds maximum memory limit"); + } *resultBufferPtr = malloc(decodedSize); if (!(*resultBufferPtr)) { RETURN_ERROR( 33, BMK_benchOutcome_t, "allocation error: not enough memory"); }
785-786: Consider clarifying the XOR-based parameter validation.The expression
!dstBuffer ^ !dstCapacitycorrectly validates that both parameters are NULL or both are non-NULL, but the intent is not immediately obvious. A more explicit check would improve readability.- int const dstParamsError = - !dstBuffer ^ !dstCapacity; /* must be both NULL or none */ + int const dstParamsError = + (dstBuffer == NULL) != (dstCapacity == 0); /* must be both NULL or both non-NULL */
1032-1049: Memory probing approach is functional but crude.The current implementation probes for available memory by attempting allocations. While this works for a benchmarking utility, be aware that:
- The probe may be slow on memory-constrained systems
- Memory availability can change between the probe and actual use
- Modern systems offer better APIs (e.g.,
sysconf(_SC_PHYS_PAGES)on POSIX)For the current use case (benchmarking tool), this approach is acceptable.
1196-1196: Consider adding a comment to explain the memory calculation.The expression
BMK_findMaxMem(totalSizeToLoad * 3) / 3divides available memory by 3 to account for source, compressed, and decompressed buffers. A brief comment would clarify this intent./* Memory allocation & restrictions */ + /* Divide available memory by 3 to account for source, compressed, and result buffers */ benchedSize = BMK_findMaxMem(totalSizeToLoad * 3) / 3;contrib/recovery/recover_directory.c (3)
60-60: Consider checking the return value of fclose.While less critical for read operations, checking
fclose()return values is good practice as it can still fail in certain scenarios.- fclose(file); + int closeResult = fclose(file); + CHECK(closeResult == 0, "fclose failed");
62-62: Remove redundant cast.The variable
datais already of typechar *, so the cast to(char *)is unnecessary.- frames.data = (char *)data; + frames.data = data;
141-141: Consider checking the return value of fclose.Unlike the earlier read-only
fclose, this one is closing a file opened for writing. Thefclosecall flushes any buffered data, and this operation can fail (e.g., disk full, I/O errors). Checking the return value ensures data integrity.- fclose(outFile); + int closeResult = fclose(outFile); + CHECK(closeResult == 0, "fclose failed for %s", outFileName);zlibWrapper/gzread.c (1)
291-365: Clarify the intent ofn = -1ingz_read
nis initialized withn = -1;and then compared/assigned as anunsigned. This is relying on wrap‑around toUINT_MAX, which is correct but slightly opaque.Consider making it explicit for future readers:
- n = -1; + n = (unsigned)-1;This keeps the behavior identical while making the intent clearer.
zlibWrapper/zstd_zlibwrapper.c (2)
85-99: Global compression/decompression toggles are not thread-safeThe wrapper uses process‑wide globals:
static int g_ZWRAP_useZSTDcompression = ZWRAP_USE_ZSTD; /* 0 = don't use ZSTD */ void ZWRAP_useZSTDcompression(int turn_on) { g_ZWRAP_useZSTDcompression = turn_on; } static ZWRAP_decompress_type g_ZWRAPdecompressionType = ZWRAP_AUTO; void ZWRAP_setDecompressionType(ZWRAP_decompress_type type) { g_ZWRAPdecompressionType = type; }If these setters are called while other threads are actively compressing/decompressing, you can get data races and non‑deterministic behavior.
If multi‑threaded use with dynamic reconfiguration is a target, consider:
- Making these flags atomic, or
- Moving configuration to per‑stream state (preferred), or
- Documenting them as “set once at process startup, before creating any streams”.
Given this is upstream‑style code, adjusting or at least documenting the threading expectations might be the pragmatic first step.
735-825: Disabled debug logging currently referencesresinstead ofresultIn the zlib‑stream branch of
z_inflate, the debug log uses an undeclaredres:int const result = (flush == Z_INFLATE_SYNC) ? inflateSync(strm) : inflate(strm, flush); LOG_WRAPPERD("- inflate3 ... res=%d\n", ..., res);This compiles today only because
LOG_WRAPPERDis defined as a no‑op comment macro:#define LOG_WRAPPERD(...) /* fprintf(stderr, __VA_ARGS__) */If you ever re‑enable logging (e.g., change the macro to call
fprintf), this will stop compiling. Consider aligning the name now:- LOG_WRAPPERD("- inflate3 ... res=%d\n", ..., res); + LOG_WRAPPERD("- inflate3 ... res=%d\n", ..., result);Same pattern appears later in the other inflate logging block.
contrib/seekable_format/zstdseek_decompress.c (2)
137-154:fflushon FILE after seek may be unnecessary and non-portable*
ZSTD_seekable_seek_FILE()uses:int const ret = LONG_SEEK((FILE*)opaque, offset, origin); if (ret) return ret; return fflush((FILE*)opaque);Calling
fflush()on a read-only stream is implementation-defined/unspecified and is generally discouraged; for pure read paths it isn’t needed afterfseek/fseeko/_fseeki64. This might behave oddly on some libcs.Given this routine is used only as a read-side callback, consider simply:
static int ZSTD_seekable_seek_FILE(void* opaque, long long offset, int origin) { - int const ret = LONG_SEEK((FILE*)opaque, offset, origin); - if (ret) return ret; - return fflush((FILE*)opaque); + return LONG_SEEK((FILE*)opaque, offset, origin); }Please confirm whether you rely on mixed read/write usage of
FILE*here; if so, we can discuss a more targeted solution.
171-195: Defensive handling of unexpectedorigininZSTD_seekable_seek_buffThe
defaultcase in theswitch(origin)only has anassert(0):default: assert(0); /* not possible */ } if (newOffset > buff->size) { return -1; }If
originis ever outside{SEEK_SET, SEEK_CUR, SEEK_END}in a release build (asserts disabled),newOffsetwill be uninitialized, leading to undefined behavior.It’s cheap to add a defensive fallback:
default: - assert(0); /* not possible */ + assert(0); /* not possible */ + return -1; }contrib/seekable_format/zstdseek_compress.c (3)
233-236: Check return value ofZSTD_CCtx_setParameter
ZSTD_CCtx_setParameterreturns asize_terror code, but its result is currently ignored:assert(zcs->maxFrameSize < INT_MAX); ZSTD_CCtx_setParameter(zcs->cstream, ZSTD_c_srcSizeHint, (int)zcs->maxFrameSize);Although
ZSTD_c_srcSizeHintis expected to be valid, ignoring errors can hide misconfiguration or version mismatches. Suggested change:assert(zcs->maxFrameSize < INT_MAX); { size_t const err = ZSTD_CCtx_setParameter( zcs->cstream, ZSTD_c_srcSizeHint, (int)zcs->maxFrameSize); if (ZSTD_isError(err)) return err; }This keeps error propagation consistent with the rest of the code.
Please double-check against your zstd version’s API docs to ensure this handling matches the expected semantics.
66-75: Frame log allocation/free patterns look good; minor safety suggestionThe frame log vector management (
ZSTD_seekable_frameLog_allocVec,ZSTD_seekable_frameLog_freeVec, andZSTD_seekable_logFrame) follows a standard pattern (initial capacity, exponential growth, bounds check onZSTD_SEEKABLE_MAXFRAMES).One small optional improvement is to null out
fl->entriesafter free to make accidental double-free safer:static size_t ZSTD_seekable_frameLog_freeVec(ZSTD_frameLog* fl) { - if (fl != NULL) free(fl->entries); + if (fl != NULL) { + free(fl->entries); + fl->entries = NULL; + fl->capacity = 0; + fl->size = 0; + } return 0; }Not required for correctness with current usage, but it hardens the API against misuse.
Also applies to: 167-195
353-365:ZSTD_seekable_endStreamflow is reasonable; be aware of size-hint semanticsThe
ZSTD_seekable_endStream()control flow:
- Finishes the last frame via
ZSTD_seekable_endFrame()(which may return a positive size hint until fully flushed).- Then emits the seek table in a streaming-friendly way using
ZSTD_seekable_writeSeekTable()andzcs->writingSeekTablestate.The pattern of returning the pending size (
endFrame + ZSTD_seekable_seekTableSize(&zcs->framelog)) as a hint before the final frame is logged is subtle but consistent. Callers must treat any non-error, non-zero return as “need more output space; call again with a larger buffer” rather than as bytes written.No changes required, but it’s worth documenting this behavior in the header so downstream users don’t misinterpret these return values.
contrib/diagnose_corruption/check_flipped_bits.c (2)
31-36: PreferZSTD_dictID_t/unsignedfor dict IDs to avoid type/format mismatches
dict_idis currentlyint32_t, but it is:
- Filled from
ZSTD_getDictID_fromFrame()/ZSTD_getDictID_fromDDict(), which return an unsigned dict ID.- Printed with
%uinsnprintf()/fprintf().Mixing
int32_twith%uis technically undefined behavior and can misrepresent large IDs. It also diverges from the ZSTD API’s unsigned dict ID type.Consider switching to the library’s dict ID typedef and updating signatures accordingly, e.g.:
- int32_t dict_id; + ZSTD_dictID_t dict_id;and similarly:
-static ZSTD_DDict* readDict(const char* filename, char **buf, size_t* size, int32_t* dict_id) { +static ZSTD_DDict* readDict(const char* filename, char **buf, size_t* size, ZSTD_dictID_t* dict_id) { -static ZSTD_DDict* readDictByID(stuff_t *stuff, int32_t dict_id, char **buf, size_t* size) { +static ZSTD_DDict* readDictByID(stuff_t *stuff, ZSTD_dictID_t dict_id, char **buf, size_t* size) {This keeps everything consistent with ZSTD’s public API and avoids UB around format specifiers. Based on library context.
Also applies to: 139-140, 144-169, 246-259, 281-289
44-50: Freeperturbedand tighten cleanup on early exits
stuff->perturbedis allocated ininit_stuff()but never freed infree_stuff(), so it’s leaked on the normal success path.- When
init_stuff()fails or when the original blob already decompresses (the early return aftertest_decompress()),free_stuff()is not called at all. The OS will reclaim memory on process exit, but explicit cleanup keeps the ownership model clear and future‑proof.Minimal improvements:
static void free_stuff(stuff_t* stuff) { free(stuff->input); + free(stuff->perturbed); free(stuff->output); ZSTD_freeDDict(stuff->ddict); free(stuff->dict); ZSTD_freeDCtx(stuff->dctx); }Optionally, zero‑initialize
stuffat declaration and always callfree_stuff()on all exit paths:-int main(int argc, char* argv[]) { - stuff_t stuff; +int main(int argc, char* argv[]) { + stuff_t stuff = {0}; @@ if(!init_stuff(&stuff, argc, argv)) { fprintf(stderr, "Failed to init.\n"); - return 1; + free_stuff(&stuff); + return 1; } @@ if (test_decompress(&stuff)) { fprintf(stderr, "Blob already decompresses successfully!\n"); - return 1; + free_stuff(&stuff); + return 1; } @@ - free_stuff(&stuff); + free_stuff(&stuff); return 0;This makes resource ownership explicit and safe even if the code evolves.
Also applies to: 191-218, 378-399
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
contrib/diagnose_corruption/check_flipped_bits.c(1 hunks)contrib/externalSequenceProducer/main.c(1 hunks)contrib/largeNbDicts/largeNbDicts.c(1 hunks)contrib/pzstd/Pzstd.h(1 hunks)contrib/pzstd/utils/Buffer.h(1 hunks)contrib/recovery/recover_directory.c(1 hunks)contrib/seekable_format/zstdseek_compress.c(1 hunks)contrib/seekable_format/zstdseek_decompress.c(1 hunks)contrib/seqBench/seqBench.c(1 hunks)doc/educational_decoder/harness.c(1 hunks)doc/educational_decoder/zstd_decompress.c(1 hunks)lib/dictBuilder/cover.c(1 hunks)lib/dictBuilder/divsufsort.c(1 hunks)lib/dictBuilder/fastcover.c(1 hunks)lib/dictBuilder/zdict.c(1 hunks)lib/legacy/zstd_v01.c(1 hunks)programs/benchfn.c(1 hunks)programs/benchzstd.c(1 hunks)programs/datagen.c(1 hunks)programs/dibio.c(1 hunks)programs/fileio_asyncio.c(1 hunks)programs/util.c(1 hunks)tests/bigdict.c(1 hunks)tests/largeDictionary.c(1 hunks)zlibWrapper/gzlib.c(1 hunks)zlibWrapper/gzread.c(1 hunks)zlibWrapper/zstd_zlibwrapper.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (21)
tests/bigdict.c (2)
tests/largeDictionary.c (2)
compress(21-61)main(63-128)programs/datagen.c (1)
RDG_genBuffer(144-152)
programs/dibio.c (2)
programs/util.c (1)
UTIL_getFileSize(545-558)lib/dictBuilder/zdict.c (3)
ZDICT_trainFromBuffer_legacy(1088-1108)ZDICT_isError(98-98)ZDICT_getErrorName(100-100)
tests/largeDictionary.c (2)
tests/bigdict.c (2)
compress(21-61)main(63-133)programs/datagen.c (1)
RDG_genBuffer(144-152)
contrib/pzstd/Pzstd.h (1)
contrib/pzstd/utils/Buffer.h (1)
pzstd(17-99)
doc/educational_decoder/zstd_decompress.c (1)
lib/legacy/zstd_v03.c (1)
ZSTD_decompress(2905-2910)
contrib/recovery/recover_directory.c (3)
programs/util.c (1)
UTIL_getFileSize(545-558)lib/legacy/zstd_v03.c (3)
ZSTD_isError(2373-2373)ZSTD_createDCtx(2981-2987)ZSTD_freeDCtx(2989-2993)lib/legacy/zstd_v04.c (3)
ZSTD_isError(2429-2429)ZSTD_createDCtx(2469-2475)ZSTD_freeDCtx(2477-2481)
zlibWrapper/gzlib.c (1)
zlibWrapper/gzread.c (9)
z_size_t(292-366)ZEXPORT(369-398)ZEXPORT(401-425)ZEXPORT(445-470)ZEXPORT(472-474)ZEXPORT(477-531)ZEXPORT(534-591)ZEXPORT(594-609)ZEXPORT(612-637)
zlibWrapper/gzread.c (1)
zlibWrapper/gzlib.c (16)
ZEXPORT(265-267)ZEXPORT(270-272)ZEXPORT(275-289)ZEXPORT(293-295)ZEXPORT(299-320)ZEXPORT(323-341)ZEXPORT(344-414)ZEXPORT(417-422)ZEXPORT(425-437)ZEXPORT(440-445)ZEXPORT(448-466)ZEXPORT(469-474)ZEXPORT(477-489)ZEXPORT(492-507)ZEXPORT(510-526)gzFile(90-262)
lib/dictBuilder/fastcover.c (2)
lib/dictBuilder/cover.c (13)
COVER_sum(278-285)COVER_computeEpochs(732-747)COVER_dictSelectionError(1035-1037)COVER_selectDict(1047-1132)COVER_dictSelectionIsError(1039-1041)COVER_best_finish(980-1024)COVER_dictSelectionFree(1043-1045)size_t(807-862)size_t(1195-1331)COVER_warnOnSmallCorpus(716-730)COVER_best_init(922-931)COVER_best_destroy(950-960)COVER_best_wait(936-945)lib/dictBuilder/zdict.c (1)
ZDICT_finalizeDictionary(862-941)
lib/dictBuilder/zdict.c (1)
lib/legacy/zstd_v03.c (4)
ERR_isError(525-525)U32(121-124)U32(161-164)U32(166-175)
contrib/largeNbDicts/largeNbDicts.c (3)
programs/util.c (6)
UTIL_getFileSize(545-558)UTIL_isDirectory(357-365)UTIL_getTotalFileSize(630-645)UTIL_createExpandedFNT(1379-1423)UTIL_assembleFileNamesTable(811-815)UTIL_freeFileNamesTable(817-823)lib/legacy/zstd_v03.c (3)
ZSTD_isError(2373-2373)ZSTD_createDCtx(2981-2987)ZSTD_freeDCtx(2989-2993)programs/benchfn.c (6)
BMK_createTimedFnState(160-166)BMK_benchTimedFn(211-256)BMK_isSuccessful_runOutcome(64-67)BMK_extract_runTime(71-75)BMK_isCompleted_TimedFn(200-203)BMK_freeTimedFnState(168-168)
contrib/diagnose_corruption/check_flipped_bits.c (2)
lib/legacy/zstd_v03.c (3)
ZSTD_freeDCtx(2989-2993)ZSTD_createDCtx(2981-2987)ZSTD_isError(2373-2373)contrib/recovery/recover_directory.c (3)
usage(29-38)readFile(47-82)main(88-152)
lib/dictBuilder/cover.c (2)
lib/legacy/zstd_v03.c (12)
U64(126-129)U64(177-187)U32(121-124)U32(161-164)U32(166-175)size_t(190-196)size_t(325-368)size_t(369-373)size_t(377-381)size_t(388-393)size_t(397-402)size_t(1276-1343)lib/dictBuilder/zdict.c (2)
ZDICT_finalizeDictionary(862-941)ZDICT_isError(98-98)
contrib/seekable_format/zstdseek_decompress.c (2)
lib/legacy/zstd_v04.c (6)
ZSTD_isError(2429-2429)U64(102-105)U64(154-164)U32(97-100)U32(137-140)U32(142-151)lib/legacy/zstd_v07.c (6)
U64(283-286)U64(307-323)U64(358-364)U32(278-281)U32(293-305)U32(349-355)
contrib/seqBench/seqBench.c (2)
contrib/externalSequenceProducer/main.c (1)
main(29-107)lib/legacy/zstd_v03.c (2)
ZSTD_isError(2373-2373)ZSTD_decompress(2905-2910)
contrib/seekable_format/zstdseek_compress.c (2)
lib/legacy/zstd_v03.c (4)
ZSTD_isError(2373-2373)U32(121-124)U32(161-164)U32(166-175)lib/legacy/zstd_v07.c (3)
U32(278-281)U32(293-305)U32(349-355)
contrib/externalSequenceProducer/main.c (1)
doc/educational_decoder/zstd_decompress.c (1)
ZSTD_decompress(384-391)
doc/educational_decoder/harness.c (1)
doc/educational_decoder/zstd_decompress.c (5)
ZSTD_get_decompressed_size(1378-1401)create_dictionary(1405-1411)parse_dictionary(1435-1487)ZSTD_decompress_with_dict(393-409)free_dictionary(1414-1425)
programs/fileio_asyncio.c (1)
contrib/seekable_format/zstdseek_decompress.c (1)
LONG_SEEK(91-106)
programs/benchzstd.c (5)
lib/legacy/zstd_v04.c (8)
ZSTD_isError(2429-2429)U32(97-100)U32(137-140)U32(142-151)U64(102-105)U64(154-164)ZSTD_createDCtx(2469-2475)ZSTD_freeDCtx(2477-2481)lib/legacy/zstd_v07.c (6)
U32(278-281)U32(293-305)U32(349-355)U64(283-286)U64(307-323)U64(358-364)programs/datagen.c (1)
RDG_genBuffer(144-152)programs/benchfn.c (5)
BMK_benchTimedFn(211-256)BMK_isSuccessful_runOutcome(64-67)BMK_extract_runTime(71-75)BMK_createTimedFnState(160-166)BMK_freeTimedFnState(168-168)programs/util.c (3)
UTIL_getFileSize(545-558)UTIL_isDirectory(357-365)UTIL_getTotalFileSize(630-645)
zlibWrapper/zstd_zlibwrapper.c (2)
lib/legacy/zstd_v03.c (2)
char(527-532)ZSTD_isError(2373-2373)lib/legacy/zstd_v04.c (1)
ZSTD_isError(2429-2429)
🪛 ast-grep (0.39.9)
zlibWrapper/gzlib.c
[warning] 239-239: This call makes a world-writable file which allows any user on a machine to write to the file. This may allow attackers to influence the behaviour of this process by writing to the file.
Context: 0666
Note: [CWE-732]: Incorrect Permission Assignment for Critical Resource [REFERENCES]
- https://wiki.sei.cmu.edu/confluence/display/c/FIO06-C.+Create+files+with+appropriate+access+permissions
(world-writable-file-c)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (13)
contrib/pzstd/utils/Buffer.h (1)
58-62: The review comment is incorrect—bounds checking already exists in the underlying Range methods.The
subpiece()andadvance()methods in Range.h (lines 121-127 and 107-112) both validate boundaries and throwstd::out_of_rangeexceptions if called with out-of-bounds parameters. InsplitAt(), the call toadvance(n)will throw an exception ifn > size(), preventing undefined behavior. No additional validation is needed at the entry point ofsplitAt().Likely an incorrect or invalid review comment.
programs/benchfn.c (3)
108-146: LGTM - Well-structured benchmarking logic.The function properly handles edge cases (minimum loop count), warms buffers before timing, validates results only on the first iteration, and correctly averages timing across loops.
211-256: LGTM - Robust adaptive benchmarking with overflow protection.The function correctly adjusts loop counts based on runtime measurements, includes overflow protection (line 238), and properly discards measurements with insufficient runtime to avoid timing noise.
27-27: No issues found —SEC_TO_MICROis properly defined intimefn.hat line 57.The constant is correctly defined as
((PTime)1000000)and will be accessible via the included header file. The code at line 27 of benchfn.c is valid and will compile without errors.Likely an incorrect or invalid review comment.
programs/benchzstd.c (2)
1054-1112: LGTM - Robust file loading with proper error handling.The function correctly handles edge cases (directories, unknown file sizes, buffer overflow by truncation) and includes proper resource cleanup. Error paths use appropriate error codes.
891-918: LGTM - Clean wrapper functions for API convenience.The wrapper functions
BMK_benchMemandBMK_benchFilesprovide simplified interfaces by using default parameters, following good API design practices.Also applies to: 1253-1270
contrib/recovery/recover_directory.c (2)
147-151: Resource cleanup looks good.All allocated resources are properly freed before program exit.
16-16: Include path is correctly configured—no issues found.The
#include "util.h"directive resolves correctly. The Makefile atcontrib/recovery/MakefiledefinesPROGRAMDIR ?= ../../programs(line 12) and includes-I$(PROGRAMDIR)in CFLAGS (line 17), which correctly provides the compiler with the include path to theprograms/directory whereutil.hresides. The build system is properly configured.doc/educational_decoder/zstd_decompress.c (1)
82-93: Let the fallback sizing clamp instead of aborting.When a frame doesn’t embed its decompressed size, we guess
MAX_COMPRESSION_RATIO * input.size. For any payload ≥64 MiB, that estimate exceeds the 1 GiB cap and we exit immediately—even though plenty of such frames legitimately decompress to <1 GiB. That makes the harness unusable for large-yet-valid streams (common with zstd streaming output). Please clamp the fallback capacity toMAX_OUTPUT_SIZE(after guarding against overflow) and only hard-fail when an explicit frame size is above the cap.- if (out_capacity == (size_t)-1) { - out_capacity = MAX_COMPRESSION_RATIO * input.size; - fprintf(stderr, "WARNING: Compressed data does not contain " - "decompressed size, going to assume the compression " - "ratio is at most %d (decompressed size of at most " - "%u) \n", - MAX_COMPRESSION_RATIO, (unsigned)out_capacity); - } - if (out_capacity > MAX_OUTPUT_SIZE) - ERR_OUT("Required output size too large for this implementation \n"); + if (out_capacity == (size_t)-1) { + if (input.size > SIZE_MAX / MAX_COMPRESSION_RATIO) { + ERR_OUT("compressed input too large to size output buffer safely \n"); + } + size_t fallback = MAX_COMPRESSION_RATIO * input.size; + if (fallback > MAX_OUTPUT_SIZE) { + fprintf(stderr, + "WARNING: Compressed data lacks a decompressed size; " + "assuming ratio %d but capping output buffer at %zu bytes \n", + MAX_COMPRESSION_RATIO, MAX_OUTPUT_SIZE); + out_capacity = MAX_OUTPUT_SIZE; + } else { + out_capacity = fallback; + } + } else if (out_capacity > MAX_OUTPUT_SIZE) { + ERR_OUT("Required output size too large for this implementation \n"); + }Likely an incorrect or invalid review comment.
zlibWrapper/gzread.c (1)
88-173: gz_look(): gzip/zstd header handling and buffer setup look correctHeader probing, buffer allocation, and
inflateInit2(..., 15 + 16)setup are consistent with zlib’s gzread semantics, and the added zstd magic check (0x28, 0xB5) integrates cleanly with the existing GZIP path. No changes needed here.zlibWrapper/gzlib.c (1)
75-87: gz_reset() state initialization looks soundResetting
x.have, eof/past flags,how,seek, error state, position, andstrm.avail_inis consistent and matches zlib’s expectations. Good foundation for both read and write modes.zlibWrapper/zstd_zlibwrapper.c (1)
172-199: CCtx creation/free logic is consistent aside from allocator issue
ZWRAP_createCCtx/ZWRAP_freeCCtxcorrectly wire the custom allocators into zstd’sZSTD_customMem, and free both the zstd stream and the wrapper context:ZWRAP_CCtx* zwc = (ZWRAP_CCtx*)ZWRAP_customCalloc(sizeof(ZWRAP_CCtx), customMem); ... ZSTD_freeCStream(zwc->zbc); ZWRAP_customFree(zwc, zwc->customMem);Once
ZWRAP_customCalloc’s null handling is fixed (see previous comment), this lifecycle looks solid.contrib/diagnose_corruption/check_flipped_bits.c (1)
331-376: Perturbation loops look correct and side‑effect free onstuff->inputBoth
perturb_bits()andperturb_bytes():
- Snapshot
old_valfromstuff->input[pos].- Mutate
stuff->perturbed[pos]only.- Restore
stuff->perturbed[pos] = old_val;after the inner loop.So the original
inputbuffer is never modified, and each perturbation starts from the same baseline. Progress logging every 1000 bytes also keeps output manageable for large inputs. No functional issues spotted in these loops.
| static char* readFile(const char* filename, size_t* size) { | ||
| struct stat statbuf; | ||
| int ret; | ||
| FILE* f; | ||
| char *buf; | ||
| size_t bytes_read; | ||
|
|
||
| ret = stat(filename, &statbuf); | ||
| if (ret != 0) { | ||
| fprintf(stderr, "stat failed: %m\n"); | ||
| return NULL; | ||
| } | ||
| if ((statbuf.st_mode & S_IFREG) != S_IFREG) { | ||
| fprintf(stderr, "Input must be regular file\n"); | ||
| return NULL; | ||
| } | ||
|
|
||
| *size = statbuf.st_size; | ||
|
|
||
| f = fopen(filename, "r"); | ||
| if (f == NULL) { | ||
| fprintf(stderr, "fopen failed: %m\n"); | ||
| return NULL; | ||
| } | ||
|
|
||
| buf = malloc(*size); | ||
| if (buf == NULL) { | ||
| fprintf(stderr, "malloc failed\n"); | ||
| fclose(f); | ||
| return NULL; | ||
| } | ||
|
|
||
| bytes_read = fread(buf, 1, *size, f); | ||
| if (bytes_read != *size) { | ||
| fprintf(stderr, "failed to read whole file\n"); | ||
| fclose(f); | ||
| free(buf); | ||
| return NULL; | ||
| } | ||
|
|
||
| ret = fclose(f); | ||
| if (ret != 0) { | ||
| fprintf(stderr, "fclose failed: %m\n"); | ||
| free(buf); | ||
| return NULL; | ||
| } | ||
|
|
||
| return buf; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
Open input/dictionary files in binary mode ("rb")
readFile() uses:
f = fopen(filename, "r");These files are zstd frames / dictionaries and should be treated as binary. Using "r" can cause newline translation on platforms that distinguish text vs binary mode, corrupting the data and breaking decompression or dict ID detection. The recovery tool snippet you provided (contrib/recovery/recover_directory.c, lines 46–81) already uses "rb" for this reason.
Recommend:
- f = fopen(filename, "r");
+ f = fopen(filename, "rb");This keeps behavior consistent with other tools and avoids subtle cross‑platform issues. Based on relevant code snippets.
I'll verify the review comment by examining the codebase. Let me gather the necessary information:
🏁 Script executed:
#!/bin/bash
# Check the readFile function in check_flipped_bits.c
echo "=== readFile function in check_flipped_bits.c ==="
cat contrib/diagnose_corruption/check_flipped_bits.c | head -150 | tail -80Length of output: 1956
🏁 Script executed:
#!/bin/bash
# Check the recovery tool mentioned for comparison
echo "=== Recovery tool file opening pattern ==="
grep -n "fopen" contrib/recovery/recover_directory.c -B2 -A2Length of output: 487
🏁 Script executed:
#!/bin/bash
# Check how readFile is used and what files it processes
echo "=== Usage of readFile function ==="
grep -n "readFile" contrib/diagnose_corruption/check_flipped_bits.c -B2 -A2Length of output: 637
🏁 Script executed:
#!/bin/bash
# Search for zstd-related file handling patterns in the codebase
echo "=== Zstd file handling in codebase ==="
rg -n "fopen.*rb" --type=c -B2 -A2 | head -40Length of output: 1998
Fix file mode from text to binary for zstd data
The review comment is verified as accurate. The readFile() function at line 92 uses fopen(filename, "r") to read binary zstd data, which will be corrupted on Windows due to text-mode newline translation. The entire codebase consistently uses binary mode ("rb") for zstd file operations—confirmed across examples, tests, and programs directories. The recovery tool (contrib/recovery/recover_directory.c, line 54) already follows this pattern. Change line 92 from "r" to "rb".
🤖 Prompt for AI Agents
In contrib/diagnose_corruption/check_flipped_bits.c around lines 75 to 123, the
file is opened with fopen(filename, "r") which will perform text-mode
translations on Windows and corrupt binary zstd data; change the fopen call to
use binary mode ("rb") instead, keeping error handling and the rest of the
function intact.
| static ZSTD_DDict* readDict(const char* filename, char **buf, size_t* size, int32_t* dict_id) { | ||
| ZSTD_DDict* ddict; | ||
| *buf = readFile(filename, size); | ||
| if (*buf == NULL) { | ||
| fprintf(stderr, "Opening dictionary file '%s' failed\n", filename); | ||
| return NULL; | ||
| } | ||
|
|
||
| ddict = ZSTD_createDDict_advanced(*buf, *size, ZSTD_dlm_byRef, ZSTD_dct_auto, ZSTD_defaultCMem); | ||
| if (ddict == NULL) { | ||
| fprintf(stderr, "Failed to create ddict.\n"); | ||
| return NULL; | ||
| } | ||
| if (dict_id != NULL) { | ||
| *dict_id = ZSTD_getDictID_fromDDict(ddict); | ||
| } | ||
| return ddict; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
Avoid leaking dictionary buffers when ZSTD_createDDict_advanced() fails
In readDict():
*buf = readFile(filename, size);
...
ddict = ZSTD_createDDict_advanced(*buf, *size, ...);
if (ddict == NULL) {
fprintf(stderr, "Failed to create ddict.\n");
return NULL;
}If ZSTD_createDDict_advanced() fails, the buffer allocated by readFile() and stored in *buf is never freed. This affects:
- Cached dict load in
init_stuff()viareadDictByID(stuff, dict_id, &stuff->dict, &stuff->dict_size); - Per‑dict loads in
test_decompress()viareadDictByID(stuff, custom_dict_id, &custom_dict, &custom_dict_size);
test_decompress() only frees custom_dict when custom_ddict != NULL, so failed ddict creations leak per call and can accumulate if the error persists.
A targeted fix inside readDict() keeps callers simple:
ddict = ZSTD_createDDict_advanced(*buf, *size, ZSTD_dlm_byRef, ZSTD_dct_auto, ZSTD_defaultCMem);
if (ddict == NULL) {
fprintf(stderr, "Failed to create ddict.\n");
+ free(*buf);
+ *buf = NULL;
+ if (size) {
+ *size = 0;
+ }
return NULL;
}With this change, failure paths in readDictByID() and test_decompress() no longer leak buffers even if ddict construction fails repeatedly.
Also applies to: 144-189, 276-319
I'll verify the memory leak issue by examining the code structure and checking the callers.
🏁 Script executed:
#!/bin/bash
# First, check if the file exists and examine its structure
fd check_flipped_bits.c --type fLength of output: 48
🏁 Script executed:
#!/bin/bash
# Get the full content of the file to understand context
cat contrib/diagnose_corruption/check_flipped_bits.cLength of output: 10376
🏁 Script executed:
#!/bin/bash
# Search for readDictByID function definition
rg -n 'readDictByID' contrib/diagnose_corruption/check_flipped_bits.c -A 10Length of output: 1368
🏁 Script executed:
#!/bin/bash
# Search for init_stuff function to understand its usage
rg -n 'init_stuff' contrib/diagnose_corruption/check_flipped_bits.c -B 2 -A 10Length of output: 647
🏁 Script executed:
#!/bin/bash
# Search for test_decompress to see how it handles dictionaries
rg -n 'test_decompress' contrib/diagnose_corruption/check_flipped_bits.c -A 20Length of output: 2425
🏁 Script executed:
#!/bin/bash
# Look at lines 144-189 to see if similar pattern exists
sed -n '144,189p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 1468
🏁 Script executed:
#!/bin/bash
# Look at lines 276-319 to see if similar pattern exists
sed -n '276,319p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 1419
🏁 Script executed:
#!/bin/bash
# Check how readFile is defined and if it allocates memory
rg -n 'readFile' contrib/diagnose_corruption/check_flipped_bits.c -B 2 -A 5Length of output: 721
🏁 Script executed:
#!/bin/bash
# Look for free() calls in the file to see current deallocation patterns
rg -n 'free\(' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 264
🏁 Script executed:
#!/bin/bash
# Check current error handling in test_decompress
rg -n 'custom_ddict' contrib/diagnose_corruption/check_flipped_bits.c -B 2 -A 2Length of output: 973
🏁 Script executed:
#!/bin/bash
# Look for where custom_dict is freed or not freed
rg -n 'custom_dict' contrib/diagnose_corruption/check_flipped_bits.c -nLength of output: 483
🏁 Script executed:
#!/bin/bash
# Understand the full context of how dict_id and related vars are used
rg -n 'dict_id' contrib/diagnose_corruption/check_flipped_bits.c -B 1 -A 1Length of output: 2068
🏁 Script executed:
#!/bin/bash
# Check for any existing free(*buf) patterns in readDict or similar functions
ast-grep --pattern $'static $_ readDict($$$) {
$$$
free($$$);
$$$
}'Length of output: 0
🏁 Script executed:
#!/bin/bash
# Look for error handling patterns in similar dictionary functions
rg -n 'fprintf.*stderr' contrib/diagnose_corruption/check_flipped_bits.c -B 2 -A 3Length of output: 5286
🏁 Script executed:
#!/bin/bash
# Check for memory cleanup in failure paths
rg -n 'return NULL' contrib/diagnose_corruption/check_flipped_bits.c -B 5 -A 1Length of output: 1681
🏁 Script executed:
#!/bin/bash
# Verify readFile returns a dynamically allocated buffer
rg -n 'char\*.*readFile' contrib/diagnose_corruption/check_flipped_bits.c -A 20Length of output: 515
🏁 Script executed:
#!/bin/bash
# Check file to understand if ZSTD_dlm_byRef means buffer ownership
rg -n 'ZSTD_dlm_byRef\|ZSTD_dlm' contrib/diagnose_corruption/check_flipped_bits.c -B 2 -A 2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get exact lines 125-142 to verify snippet
sed -n '125,142p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 555
🏁 Script executed:
#!/bin/bash
# Check all calls to readDictByID to see how failures are handled
rg -n 'readDictByID' contrib/diagnose_corruption/check_flipped_bits.c -B 2 -A 8Length of output: 1461
🏁 Script executed:
#!/bin/bash
# Verify if custom_dict cleanup is conditional on custom_ddict != NULL
sed -n '1,400p' contrib/diagnose_corruption/check_flipped_bits.c | grep -n -A 30 'test_decompress'Length of output: 2843
🏁 Script executed:
#!/bin/bash
# Search for where custom_dict variable is freed
sed -n '1,500p' contrib/diagnose_corruption/check_flipped_bits.c | grep -n 'custom_dict\|free'Length of output: 941
🏁 Script executed:
#!/bin/bash
# Full view of function calls around the issue lines
sed -n '110,200p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 2460
🏁 Script executed:
#!/bin/bash
# Check what happens after readDictByID in init_stuff and test_decompress
sed -n '1,600p' contrib/diagnose_corruption/check_flipped_bits.c | grep -n -B 5 -A 10 'readDictByID'Length of output: 1895
🏁 Script executed:
#!/bin/bash
# Find all function definitions to understand scope
rg -n '^[a-zA-Z_].*\(' contrib/diagnose_corruption/check_flipped_bits.c | head -20Length of output: 628
🏁 Script executed:
#!/bin/bash
# Check size_t* parameter in readDict - is size conditional
sed -n '125,145p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 692
🏁 Script executed:
#!/bin/bash
# Understand if 'size' parameter can be NULL
rg -n 'readDict' contrib/diagnose_corruption/check_flipped_bits.c | grep -v BinaryLength of output: 551
🏁 Script executed:
#!/bin/bash
# Check the fix's suggestion to conditionally check size
# Look for similar NULL checks in the file
rg -n 'if.*size' contrib/diagnose_corruption/check_flipped_bits.c -B 1 -A 1Length of output: 132
🏁 Script executed:
#!/bin/bash
# Verify parameter signature for readDict
sed -n '124,126p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 117
🏁 Script executed:
#!/bin/bash
# Full file view to understand everything at once
wc -l contrib/diagnose_corruption/check_flipped_bits.cLength of output: 52
🏁 Script executed:
#!/bin/bash
# Output entire file in segments (first half)
head -n 400 contrib/diagnose_corruption/check_flipped_bits.cLength of output: 10376
🏁 Script executed:
#!/bin/bash
# Second half of file
tail -n +400 contrib/diagnose_corruption/check_flipped_bits.cLength of output: 1
🏁 Script executed:
#!/bin/bash
# Find the exact function signatures and verify parameters
rg -n 'static.*readDict\|static.*readDictByID\|init_stuff\|test_decompress' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 0
🏁 Script executed:
#!/bin/bash
# Check if size parameter could be NULL (non-pointer parameter)
ast-grep --pattern $'size_t\* size'Length of output: 0
🏁 Script executed:
#!/bin/bash
# Get exact function signature of readDict to verify size is size_t*
sed -n '125,126p' contrib/diagnose_corruption/check_flipped_bits.c && echo "---" && sed -n '130,140p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 412
🏁 Script executed:
#!/bin/bash
# Check for calls to free() on size-related variables in the file
rg -n 'free.*size\|free.*buf' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 0
🏁 Script executed:
#!/bin/bash
# Ensure we see readFile implementation
rg -n 'char\* readFile' contrib/diagnose_corruption/check_flipped_bits.c -A 30Length of output: 736
🏁 Script executed:
#!/bin/bash
# Check if ZSTD_dlm_byRef means we own the buffer (not ZSTD)
# Also check for ZSTD API documentation or comments
rg -n 'ZSTD_dlm\|byRef\|memory' contrib/diagnose_corruption/check_flipped_bits.c | head -20Length of output: 0
🏁 Script executed:
#!/bin/bash
# Verify lines 144-189 and 276-319 are indeed similar functions
echo "=== Lines 144-189 ===" && sed -n '144,189p' contrib/diagnose_corruption/check_flipped_bits.c && echo "" && echo "=== Lines 276-319 ===" && sed -n '276,319p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 2933
🏁 Script executed:
#!/bin/bash
# For precise understanding, check function names at those line ranges
sed -n '144p;189p;276p;319p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 145
🏁 Script executed:
#!/bin/bash
# Get function start/end for lines 144-189
sed -n '140,195p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 1618
🏁 Script executed:
#!/bin/bash
# Get function start/end for lines 276-319
sed -n '270,325p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 1622
🏁 Script executed:
#!/bin/bash
# Check how readDictByID uses readDict result and handles errors
sed -n '40,100p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 1641
🏁 Script executed:
#!/bin/bash
# Find and display readDictByID complete function
rg -n 'readDictByID' contrib/diagnose_corruption/check_flipped_bits.c -B 5 -A 50Length of output: 4960
🏁 Script executed:
#!/bin/bash
# Check init_stuff completely
sed -n '190,270p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 2021
🏁 Script executed:
#!/bin/bash
# Check test_decompress completely
sed -n '320,400p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 1898
🏁 Script executed:
#!/bin/bash
# Final comprehensive check - show lines around the issue plus the callers
echo "=== readDict ===" && sed -n '125,145p' contrib/diagnose_corruption/check_flipped_bits.c && echo "" && echo "=== readDictByID ===" && sed -n '70,90p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 1172
🏁 Script executed:
#!/bin/bash
# Show where custom_dict is freed in test_decompress
sed -n '320,450p' contrib/diagnose_corruption/check_flipped_bits.c | grep -n 'custom_dict\|custom_ddict\|free\|readDictByID'Length of output: 117
🏁 Script executed:
#!/bin/bash
# Check if proposed fix has an issue - size is size_t* not NULL checkable
sed -n '125,126p' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 116
🏁 Script executed:
#!/bin/bash
# Verify the proposed fix's NULL check for size - this might be a bug in the fix
# If size is size_t* (pointer to size_t), it should never be NULL in this context
rg -n 'if.*size' contrib/diagnose_corruption/check_flipped_bits.c | grep -v BinaryLength of output: 32
🏁 Script executed:
#!/bin/bash
# Check actual function signature carefully
grep -n 'static ZSTD_DDict\* readDict' contrib/diagnose_corruption/check_flipped_bits.c -A 1Length of output: 271
🏁 Script executed:
#!/bin/bash
# Sanity check: size_t* is always valid (it's passed by reference from callers)
# Verify how readDict is called with size parameter
rg -n 'readDict.*&' contrib/diagnose_corruption/check_flipped_bits.cLength of output: 354
🏁 Script executed:
#!/bin/bash
# Final check on proposed fix - the 'if (size)' check seems wrong since size_t* is never NULL in this context
# Let's verify the signature one more time and see all calls
echo "=== Function signature ===" && sed -n '125,126p' contrib/diagnose_corruption/check_flipped_bits.c && echo "" && echo "=== Calls to readDict ===" && rg 'readDict\(' contrib/diagnose_corruption/check_flipped_bits.c -B 1 -A 1Length of output: 549
Fix memory leak in readDict() when ZSTD_createDDict_advanced() fails (lines 125-142, also 144-189, 276-319 affected)
The review is correct. The buffer allocated by readFile() and stored in *buf is indeed leaked when ZSTD_createDDict_advanced() fails at line 134. This occurs in:
- Direct impact:
readDict()at lines 125-142 fails to clean up afterZSTD_createDDict_advanced()returns NULL - Cascading leaks via readDictByID() (lines 144-189): When
readDict()fails insidereadDictByID(), the buffer is leaked; callers likeinit_stuff()(line 249) andtest_decompress()(line 288) don't clean it up - Loop accumulation in test_decompress() (lines 276-319): Called repeatedly in
perturb_bits()andperturb_bytes(), leaked buffers accumulate on each ddict creation failure
The proposed fix is sound: adding free(*buf); *buf = NULL; inside readDict() ensures cleanup happens regardless of caller context.
Minor note: The proposed if (size) check is unnecessary since size is a size_t* parameter always passed by reference (never NULL), but it's harmless defensive code.
🤖 Prompt for AI Agents
contrib/diagnose_corruption/check_flipped_bits.c around lines 125-142 (and
related callers 144-189, 276-319): readDict() currently leaks the buffer
returned by readFile() when ZSTD_createDDict_advanced() returns NULL; fix by
freeing the allocated buffer and nulling the pointer before returning on failure
(e.g., call free(*buf); *buf = NULL; immediately after detecting ddict == NULL),
and ensure callers do not assume ownership of buf on failure; apply the same
pattern to other failure paths/call sites to prevent repeated accumulation of
leaked buffers.
| #define CHECK(res) \ | ||
| do { \ | ||
| if (ZSTD_isError(res)) { \ | ||
| printf("ERROR: %s\n", ZSTD_getErrorName(res)); \ | ||
| return 1; \ | ||
| } \ | ||
| } while (0) \ |
There was a problem hiding this comment.
Resource leaks on error paths when CHECK fails.
The CHECK macro returns early on error but doesn't clean up previously allocated resources, causing memory leaks at Lines 48, 77, and 84.
Consider refactoring to use goto-based cleanup or refactor CHECK to support cleanup:
-#define CHECK(res) \
-do { \
- if (ZSTD_isError(res)) { \
- printf("ERROR: %s\n", ZSTD_getErrorName(res)); \
- return 1; \
- } \
-} while (0) \
int main(int argc, char *argv[]) {
+ ZSTD_CCtx* zc = NULL;
+ char* src = NULL;
+ char* dst = NULL;
+ char* val = NULL;
+ int ret = 0;
+
if (argc != 2) {
printf("Usage: externalSequenceProducer <file>\n");
return 1;
}
- ZSTD_CCtx* const zc = ZSTD_createCCtx();
+ zc = ZSTD_createCCtx();
+ if (!zc) {
+ printf("ERROR: Failed to create compression context\n");
+ return 1;
+ }
// ... rest of allocations with error checking ...
{
size_t const res = ZSTD_CCtx_setParameter(zc, ZSTD_c_enableSeqProducerFallback, 1);
- CHECK(res);
+ if (ZSTD_isError(res)) {
+ printf("ERROR: %s\n", ZSTD_getErrorName(res));
+ ret = 1;
+ goto cleanup;
+ }
}
// ... similar for other CHECK calls ...
+cleanup:
+ ZSTD_freeCCtx(zc);
+ free(src);
+ free(dst);
+ free(val);
+ return ret;
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In contrib/externalSequenceProducer/main.c around lines 21 to 27, the CHECK
macro returns immediately on error and leaks resources allocated earlier
(notably at lines ~48, ~77 and ~84); change error handling to perform cleanup
before returning by either (A) refactoring CHECK to accept a label name and do a
goto cleanup on error, or (B) remove the macro and replace each CHECK invocation
with an if (ZSTD_isError(res)) { fprintf(...); goto cleanup; } pattern. Add a
single cleanup: label near function exit that frees any allocated memory, closes
handles and releases resources in reverse allocation order, set an appropriate
non-zero exit code, and ensure all paths (success and error) jump to that
cleanup label.
| return 1; | ||
| } | ||
|
|
||
| ZSTD_CCtx* const zc = ZSTD_createCCtx(); |
There was a problem hiding this comment.
Add null check after context creation.
ZSTD_createCCtx() can return NULL on allocation failure. Using the context without validation can cause a segmentation fault.
Apply this diff to add a null check:
ZSTD_CCtx* const zc = ZSTD_createCCtx();
+if (!zc) {
+ printf("ERROR: Failed to create compression context\n");
+ return 1;
+}🤖 Prompt for AI Agents
In contrib/externalSequenceProducer/main.c around line 35, the code calls
ZSTD_createCCtx() and assigns to ZSTD_CCtx* const zc without checking for NULL;
add a null check immediately after creation and handle allocation failure (e.g.,
log an error, free any previously allocated resources, and return/exit with an
error code) so the function never dereferences zc when it is NULL; ensure the
success path continues to use zc and that any early-return paths clean up other
state as needed.
| FILE *f = fopen(argv[1], "rb"); | ||
| assert(f); | ||
| { | ||
| int const ret = fseek(f, 0, SEEK_END); | ||
| assert(ret == 0); | ||
| } | ||
| size_t const srcSize = ftell(f); | ||
| { | ||
| int const ret = fseek(f, 0, SEEK_SET); | ||
| assert(ret == 0); | ||
| } | ||
|
|
||
| char* const src = malloc(srcSize + 1); | ||
| assert(src); | ||
| { | ||
| size_t const ret = fread(src, srcSize, 1, f); | ||
| assert(ret == 1); | ||
| int const ret2 = fclose(f); | ||
| assert(ret2 == 0); | ||
| } |
There was a problem hiding this comment.
Replace assert() with proper error handling for file operations.
Using assert() for file I/O error checking is problematic:
- Assertions are disabled in release builds (
-DNDEBUG), leaving no error checking in production - Abrupt termination provides poor user experience instead of clean error messages
Additionally, Line 63 allocates srcSize + 1 bytes but only reads srcSize bytes, suggesting possible intent to null-terminate that isn't implemented.
Apply this diff to add proper error handling:
FILE *f = fopen(argv[1], "rb");
-assert(f);
+if (!f) {
+ printf("ERROR: Failed to open file %s\n", argv[1]);
+ ZSTD_freeCCtx(zc);
+ return 1;
+}
{
- int const ret = fseek(f, 0, SEEK_END);
- assert(ret == 0);
+ if (fseek(f, 0, SEEK_END) != 0) {
+ printf("ERROR: Failed to seek in file\n");
+ fclose(f);
+ ZSTD_freeCCtx(zc);
+ return 1;
+ }
}
size_t const srcSize = ftell(f);
{
- int const ret = fseek(f, 0, SEEK_SET);
- assert(ret == 0);
+ if (fseek(f, 0, SEEK_SET) != 0) {
+ printf("ERROR: Failed to seek in file\n");
+ fclose(f);
+ ZSTD_freeCCtx(zc);
+ return 1;
+ }
}
-char* const src = malloc(srcSize + 1);
-assert(src);
+char* const src = malloc(srcSize);
+if (!src) {
+ printf("ERROR: Failed to allocate memory\n");
+ fclose(f);
+ ZSTD_freeCCtx(zc);
+ return 1;
+}
{
- size_t const ret = fread(src, srcSize, 1, f);
- assert(ret == 1);
+ if (fread(src, srcSize, 1, f) != 1) {
+ printf("ERROR: Failed to read file\n");
+ free(src);
+ fclose(f);
+ ZSTD_freeCCtx(zc);
+ return 1;
+ }
- int const ret2 = fclose(f);
- assert(ret2 == 0);
+ fclose(f);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| FILE *f = fopen(argv[1], "rb"); | |
| assert(f); | |
| { | |
| int const ret = fseek(f, 0, SEEK_END); | |
| assert(ret == 0); | |
| } | |
| size_t const srcSize = ftell(f); | |
| { | |
| int const ret = fseek(f, 0, SEEK_SET); | |
| assert(ret == 0); | |
| } | |
| char* const src = malloc(srcSize + 1); | |
| assert(src); | |
| { | |
| size_t const ret = fread(src, srcSize, 1, f); | |
| assert(ret == 1); | |
| int const ret2 = fclose(f); | |
| assert(ret2 == 0); | |
| } | |
| FILE *f = fopen(argv[1], "rb"); | |
| if (!f) { | |
| printf("ERROR: Failed to open file %s\n", argv[1]); | |
| ZSTD_freeCCtx(zc); | |
| return 1; | |
| } | |
| { | |
| if (fseek(f, 0, SEEK_END) != 0) { | |
| printf("ERROR: Failed to seek in file\n"); | |
| fclose(f); | |
| ZSTD_freeCCtx(zc); | |
| return 1; | |
| } | |
| } | |
| size_t const srcSize = ftell(f); | |
| { | |
| if (fseek(f, 0, SEEK_SET) != 0) { | |
| printf("ERROR: Failed to seek in file\n"); | |
| fclose(f); | |
| ZSTD_freeCCtx(zc); | |
| return 1; | |
| } | |
| } | |
| char* const src = malloc(srcSize); | |
| if (!src) { | |
| printf("ERROR: Failed to allocate memory\n"); | |
| fclose(f); | |
| ZSTD_freeCCtx(zc); | |
| return 1; | |
| } | |
| { | |
| if (fread(src, srcSize, 1, f) != 1) { | |
| printf("ERROR: Failed to read file\n"); | |
| free(src); | |
| fclose(f); | |
| ZSTD_freeCCtx(zc); | |
| return 1; | |
| } | |
| fclose(f); | |
| } |
| /* Create an array of pointers to the lines in a buffer */ | ||
| static const char** UTIL_createLinePointers(char* buffer, size_t numLines, size_t bufferSize) | ||
| { | ||
| size_t lineIndex = 0; | ||
| size_t pos = 0; | ||
| void* const bufferPtrs = malloc(numLines * sizeof(const char**)); | ||
| const char** const linePointers = (const char**)bufferPtrs; | ||
| if (bufferPtrs == NULL) return NULL; | ||
|
|
||
| while (lineIndex < numLines && pos < bufferSize) { | ||
| size_t len = 0; | ||
| linePointers[lineIndex++] = buffer+pos; | ||
|
|
||
| /* Find the next null terminator, being careful not to go past the buffer */ | ||
| while ((pos + len < bufferSize) && buffer[pos + len] != '\0') { | ||
| len++; | ||
| } | ||
|
|
||
| /* Move past this string and its null terminator */ | ||
| pos += len; | ||
| if (pos < bufferSize) pos++; /* Skip the null terminator if we're not at buffer end */ | ||
| } | ||
|
|
||
| /* Verify we processed the expected number of lines */ | ||
| if (lineIndex != numLines) { | ||
| /* Something went wrong - we didn't find as many lines as expected */ | ||
| free(bufferPtrs); | ||
| return NULL; | ||
| } | ||
|
|
||
| return linePointers; | ||
| } |
There was a problem hiding this comment.
Fix incorrect sizeof in malloc.
Line 720 uses sizeof(const char**) when it should use sizeof(const char*). While these typically have the same value (pointer size), this is semantically incorrect and could cause issues on unusual architectures.
Apply this fix:
- void* const bufferPtrs = malloc(numLines * sizeof(const char**));
+ void* const bufferPtrs = malloc(numLines * sizeof(const char*));
const char** const linePointers = (const char**)bufferPtrs;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* Create an array of pointers to the lines in a buffer */ | |
| static const char** UTIL_createLinePointers(char* buffer, size_t numLines, size_t bufferSize) | |
| { | |
| size_t lineIndex = 0; | |
| size_t pos = 0; | |
| void* const bufferPtrs = malloc(numLines * sizeof(const char**)); | |
| const char** const linePointers = (const char**)bufferPtrs; | |
| if (bufferPtrs == NULL) return NULL; | |
| while (lineIndex < numLines && pos < bufferSize) { | |
| size_t len = 0; | |
| linePointers[lineIndex++] = buffer+pos; | |
| /* Find the next null terminator, being careful not to go past the buffer */ | |
| while ((pos + len < bufferSize) && buffer[pos + len] != '\0') { | |
| len++; | |
| } | |
| /* Move past this string and its null terminator */ | |
| pos += len; | |
| if (pos < bufferSize) pos++; /* Skip the null terminator if we're not at buffer end */ | |
| } | |
| /* Verify we processed the expected number of lines */ | |
| if (lineIndex != numLines) { | |
| /* Something went wrong - we didn't find as many lines as expected */ | |
| free(bufferPtrs); | |
| return NULL; | |
| } | |
| return linePointers; | |
| } | |
| /* Create an array of pointers to the lines in a buffer */ | |
| static const char** UTIL_createLinePointers(char* buffer, size_t numLines, size_t bufferSize) | |
| { | |
| size_t lineIndex = 0; | |
| size_t pos = 0; | |
| void* const bufferPtrs = malloc(numLines * sizeof(const char*)); | |
| const char** const linePointers = (const char**)bufferPtrs; | |
| if (bufferPtrs == NULL) return NULL; | |
| while (lineIndex < numLines && pos < bufferSize) { | |
| size_t len = 0; | |
| linePointers[lineIndex++] = buffer+pos; | |
| /* Find the next null terminator, being careful not to go past the buffer */ | |
| while ((pos + len < bufferSize) && buffer[pos + len] != '\0') { | |
| len++; | |
| } | |
| /* Move past this string and its null terminator */ | |
| pos += len; | |
| if (pos < bufferSize) pos++; /* Skip the null terminator if we're not at buffer end */ | |
| } | |
| /* Verify we processed the expected number of lines */ | |
| if (lineIndex != numLines) { | |
| /* Something went wrong - we didn't find as many lines as expected */ | |
| free(bufferPtrs); | |
| return NULL; | |
| } | |
| return linePointers; | |
| } |
🤖 Prompt for AI Agents
In programs/util.c around lines 715 to 746, the malloc call allocates numLines *
sizeof(const char**) which is semantically wrong; change the allocation to use
sizeof(const char*) (i.e. malloc(numLines * sizeof(const char*))) so you
allocate an array of const char* pointers; keep the rest of the function the
same, check for NULL as already done, and free bufferPtrs on error as before.
| if (*bufStart + *pos + pathLength >= *bufEnd) { | ||
| ptrdiff_t newListSize = (*bufEnd - *bufStart) + LIST_SIZE_INCREASE; | ||
| assert(newListSize >= 0); | ||
| *bufStart = (char*)UTIL_realloc(*bufStart, (size_t)newListSize); | ||
| if (*bufStart != NULL) { | ||
| *bufEnd = *bufStart + newListSize; | ||
| } else { | ||
| free(path); closedir(dir); return 0; | ||
| } | ||
| } |
There was a problem hiding this comment.
Validate cast from ptrdiff_t to size_t before using.
The code asserts that newListSize >= 0 before casting to size_t, but assertions are compiled out in release builds. Additionally, the ptrdiff_t addition could theoretically overflow.
Use an explicit check instead:
if (*bufStart + *pos + pathLength >= *bufEnd) {
ptrdiff_t newListSize = (*bufEnd - *bufStart) + LIST_SIZE_INCREASE;
- assert(newListSize >= 0);
- *bufStart = (char*)UTIL_realloc(*bufStart, (size_t)newListSize);
- if (*bufStart != NULL) {
- *bufEnd = *bufStart + newListSize;
- } else {
- free(path); closedir(dir); return 0;
- }
+ if (newListSize <= 0 || newListSize > PTRDIFF_MAX) {
+ free(path); closedir(dir); return 0;
+ }
+ *bufStart = (char*)UTIL_realloc(*bufStart, (size_t)newListSize);
+ if (*bufStart != NULL) {
+ *bufEnd = *bufStart + newListSize;
+ } else {
+ free(path); closedir(dir); return 0;
+ }
}🤖 Prompt for AI Agents
In programs/util.c around lines 1015 to 1024, the code computes newListSize as a
ptrdiff_t and only uses assert(newListSize >= 0) before casting to size_t, which
is unsafe in release builds and can overflow; change this by checking
newListSize is non-negative and does not exceed SIZE_MAX (or otherwise fits in
size_t) before casting, handle the failure path (free path/closedir and return
0) if the check fails, and then use the safe size_t value for UTIL_realloc and
pointer arithmetic; ensure you also guard the addition that produced newListSize
against overflow (e.g., check *bufEnd - *bufStart <= SIZE_MAX -
LIST_SIZE_INCREASE) before computing the size.
| int UTIL_countCores(int logical) | ||
| { | ||
| static int numCores = 0; | ||
| if (numCores != 0) return numCores; | ||
|
|
||
| { LPFN_GLPI glpi; | ||
| BOOL done = FALSE; | ||
| PSYSTEM_LOGICAL_PROCESSOR_INFORMATION buffer = NULL; | ||
| PSYSTEM_LOGICAL_PROCESSOR_INFORMATION ptr = NULL; | ||
| DWORD returnLength = 0; | ||
| size_t byteOffset = 0; | ||
|
|
||
| #if defined(_MSC_VER) | ||
| /* Visual Studio does not like the following cast */ | ||
| # pragma warning( disable : 4054 ) /* conversion from function ptr to data ptr */ | ||
| # pragma warning( disable : 4055 ) /* conversion from data ptr to function ptr */ | ||
| #endif | ||
| glpi = (LPFN_GLPI)(void*)GetProcAddress(GetModuleHandle(TEXT("kernel32")), | ||
| "GetLogicalProcessorInformation"); | ||
|
|
||
| if (glpi == NULL) { | ||
| goto failed; | ||
| } | ||
|
|
||
| while(!done) { | ||
| DWORD rc = glpi(buffer, &returnLength); | ||
| if (FALSE == rc) { | ||
| if (GetLastError() == ERROR_INSUFFICIENT_BUFFER) { | ||
| if (buffer) | ||
| free(buffer); | ||
| buffer = (PSYSTEM_LOGICAL_PROCESSOR_INFORMATION)malloc(returnLength); | ||
|
|
||
| if (buffer == NULL) { | ||
| perror("zstd"); | ||
| exit(1); | ||
| } | ||
| } else { | ||
| /* some other error */ | ||
| goto failed; | ||
| } | ||
| } else { | ||
| done = TRUE; | ||
| } } | ||
|
|
||
| ptr = buffer; | ||
|
|
||
| while (byteOffset + sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION) <= returnLength) { | ||
|
|
||
| if (ptr->Relationship == RelationProcessorCore) { | ||
| if (logical) | ||
| numCores += CountSetBits(ptr->ProcessorMask); | ||
| else | ||
| numCores++; | ||
| } | ||
|
|
||
| ptr++; | ||
| byteOffset += sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION); | ||
| } | ||
|
|
||
| free(buffer); | ||
|
|
||
| return numCores; | ||
| } | ||
|
|
||
| failed: | ||
| /* try to fall back on GetSystemInfo */ | ||
| { SYSTEM_INFO sysinfo; | ||
| GetSystemInfo(&sysinfo); | ||
| numCores = sysinfo.dwNumberOfProcessors; | ||
| if (numCores == 0) numCores = 1; /* just in case */ | ||
| } | ||
| return numCores; | ||
| } |
There was a problem hiding this comment.
Remove exit() calls from core counting utilities.
The core counting functions contain exit(1) calls on errors (lines 1503-1504, 1562-1563, 1670-1671). Library utilities should return error indicators (e.g., -1 or 1 as fallback) rather than terminating the process.
Replace exit calls with fallback values:
if (buffer == NULL) {
- perror("zstd");
- exit(1);
+ /* allocation failed, fall back to 1 core */
+ numCores = 1;
+ return numCores;
}Apply similar changes at lines 1562-1563 and 1670-1671.
Also applies to: 1550-1569, 1648-1686
| state.state->fd = fd > -1 ? fd : ( | ||
| #ifdef WIDECHAR | ||
| fd == -2 ? _wopen(path, oflag, 0666) : | ||
| #endif | ||
| open((const char *)path, oflag, 0666)); |
There was a problem hiding this comment.
open(..., 0666) may create overly permissive files
The gz_open path uses:
state.state->fd = fd > -1 ? fd : (
#ifdef WIDECHAR
fd == -2 ? _wopen(path, oflag, 0666) :
#endif
open((const char *)path, oflag, 0666));Using 0666 means “world‑writable” before umask is applied. On systems with a permissive or zero umask this can create files writable by other users, which is a security concern (CWE‑732).
If you don’t need strict compatibility with upstream zlib behavior, consider tightening this:
- fd == -2 ? _wopen(path, oflag, 0666) :
- open((const char *)path, oflag, 0666));
+ fd == -2 ? _wopen(path, oflag, 0600) :
+ open((const char *)path, oflag, 0600));or gate the stricter mode behind a build‑time option so consumers can choose.
🧰 Tools
🪛 ast-grep (0.39.9)
[warning] 239-239: This call makes a world-writable file which allows any user on a machine to write to the file. This may allow attackers to influence the behaviour of this process by writing to the file.
Context: 0666
Note: [CWE-732]: Incorrect Permission Assignment for Critical Resource [REFERENCES]
- https://wiki.sei.cmu.edu/confluence/display/c/FIO06-C.+Create+files+with+appropriate+access+permissions
(world-writable-file-c)
🤖 Prompt for AI Agents
In zlibWrapper/gzlib.c around lines 236-240, the code calls open/_wopen with
mode 0666 which can produce world-writable files on permissive umasks; change
the created file mode to a stricter default (for example 0644 or 0600 depending
on desired privacy) and/or make the mode configurable via a build-time option or
macro so consumers can opt into the original 0666 behavior; ensure both the open
and _wopen calls use the new mode value and document the chosen default in a
comment.
| static void* ZWRAP_customCalloc(size_t size, ZSTD_customMem customMem) | ||
| { | ||
| if (customMem.customAlloc) { | ||
| /* calloc implemented as malloc+memset; | ||
| * not as efficient as calloc, but next best guess for custom malloc */ | ||
| void* const ptr = customMem.customAlloc(customMem.opaque, size); | ||
| memset(ptr, 0, size); | ||
| return ptr; | ||
| } | ||
| return calloc(1, size); | ||
| } |
There was a problem hiding this comment.
Potential null dereference in ZWRAP_customCalloc when custom allocator fails
Here:
if (customMem.customAlloc) {
/* calloc implemented as malloc+memset; ... */
void* const ptr = customMem.customAlloc(customMem.opaque, size);
memset(ptr, 0, size);
return ptr;
}
return calloc(1, size);If customAlloc returns NULL (e.g., out of memory), memset(ptr, 0, size); will dereference a null pointer and likely crash, instead of propagating the allocation failure.
Recommend guarding the memset:
static void* ZWRAP_customCalloc(size_t size, ZSTD_customMem customMem)
{
if (customMem.customAlloc) {
void* const ptr = customMem.customAlloc(customMem.opaque, size);
- memset(ptr, 0, size);
- return ptr;
+ if (ptr != NULL) {
+ memset(ptr, 0, size);
+ }
+ return ptr;
}
return calloc(1, size);
}Callers already check for a NULL return, so this fix preserves behavior while avoiding the crash.
🤖 Prompt for AI Agents
In zlibWrapper/zstd_zlibwrapper.c around lines 128 to 138, the customCalloc
implementation unconditionally calls memset on the pointer returned by
customMem.customAlloc which can be NULL; update the code to check if ptr is NULL
before calling memset and simply return NULL if allocation failed, otherwise
perform memset and return the pointer, preserving existing behavior for the
non-custom path that calls calloc.
There was a problem hiding this comment.
Actionable comments posted: 20
♻️ Duplicate comments (1)
contrib/diagnose_corruption/check_flipped_bits.c (1)
44-50: Tighten memory management in helper and init paths (known leak issues)There are a few related leaks / incomplete cleanups:
free_stuff()(Lines 44–50) never freesstuff->perturbed, even though it is allocated ininit_stuff()as a long-lived buffer.readDict()(Lines 125–142) may leak the buffer returned in*bufifZSTD_createDDict_advanced()fails: it returnsNULLwithout freeing*buf, and the caller only checks the return value.init_stuff()(Lines 191–274) returns0on several allocation/initialization failures after some fields are already allocated/initialized;main()exits immediately on failure and never callsfree_stuff(), so those partial allocations are leaked.Minimal fixes:
static void free_stuff(stuff_t* stuff) { free(stuff->input); + free(stuff->perturbed); free(stuff->output); ZSTD_freeDDict(stuff->ddict); free(stuff->dict); ZSTD_freeDCtx(stuff->dctx); } static ZSTD_DDict* readDict(const char* filename, char **buf, size_t* size, int32_t* dict_id) { ZSTD_DDict* ddict; *buf = readFile(filename, size); if (*buf == NULL) { fprintf(stderr, "Opening dictionary file '%s' failed\n", filename); return NULL; } ddict = ZSTD_createDDict_advanced(*buf, *size, ZSTD_dlm_byRef, ZSTD_dct_auto, ZSTD_defaultCMem); if (ddict == NULL) { fprintf(stderr, "Failed to create ddict.\n"); + free(*buf); + *buf = NULL; return NULL; } ... } static int init_stuff(stuff_t* stuff, int argc, char *argv[]) { ... stuff->input = readFile(input_filename, &stuff->input_size); if (stuff->input == NULL) { fprintf(stderr, "Failed to read input file.\n"); - return 0; + return 0; } stuff->perturbed = malloc(stuff->input_size); if (stuff->perturbed == NULL) { fprintf(stderr, "malloc failed.\n"); - return 0; + free(stuff->input); + return 0; } ... stuff->output = malloc(stuff->output_size); if (stuff->output == NULL) { fprintf(stderr, "malloc failed.\n"); - return 0; + free(stuff->perturbed); + free(stuff->input); + return 0; } ... if (stuff->dict_file_dir_name) { ... if (stuff->ddict == NULL) { fprintf(stderr, "Failed to create cached ddict.\n"); - return 0; + free(stuff->output); + free(stuff->perturbed); + free(stuff->input); + return 0; } } else if (stuff->dict_file_name) { ... if (stuff->ddict == NULL) { fprintf(stderr, "Failed to create ddict from '%s'.\n", stuff->dict_file_name); - return 0; + free(stuff->output); + free(stuff->perturbed); + free(stuff->input); + return 0; } } stuff->dctx = ZSTD_createDCtx(); if (stuff->dctx == NULL) { - return 0; + free(stuff->ddict); + free(stuff->output); + free(stuff->perturbed); + free(stuff->input); + return 0; } ... }And in
main(), consider callingfree_stuff(&stuff);before returning on early failure paths to ensure consistent cleanup.Also applies to: 125-142, 191-274
🧹 Nitpick comments (13)
contrib/pzstd/Pzstd.h (2)
19-25: Consider scopingZSTD_DISABLE_DEPRECATE_WARNINGSto justzstd.h.You
#undef ZSTD_STATIC_LINKING_ONLYafter includingzstd.hbut leaveZSTD_DISABLE_DEPRECATE_WARNINGSdefined. If the intent is only to silence deprecation warnings from zstd itself, also#undefit (or wrap in a push/pop pattern) so it doesn’t accidentally affect downstream headers in the same translation unit.
118-128: Fix parameter documentation forasyncDecompressFrames’s executor.The
executorparameter is documented as “the thread pool to run compression jobs in”, but this function is for decompression. Adjusting the wording to “decompression jobs” will avoid confusion when reading the header.contrib/pzstd/utils/Buffer.h (1)
47-48:Bufferis move-only; confirm this is intentional or add copy operations.By explicitly defaulting the move constructor and move assignment but not the copy operations, the compiler implicitly deletes the copy ctor/assignment in C++11+, making
Buffermove-only even though it wraps astd::shared_ptr. If you intended cheap, copyable “view” semantics over shared storage, consider explicitly defaulting the copy constructor and copy assignment as well.contrib/externalSequenceProducer/main.c (1)
76-105: Centralize cleanup to avoid leaks on early returnsOn error paths (e.g.,
CHECK(cSize)failure or buffer mismatch) the function returns1without freeingsrc,dst,val, or theZSTD_CCtx. While the OS will reclaim memory on process exit, it’s easy to avoid leaks by using a single cleanup label.For example:
- size_t const cSize = ZSTD_compress2(zc, dst, dstSize, src, srcSize); - CHECK(cSize); + size_t const cSize = ZSTD_compress2(zc, dst, dstSize, src, srcSize); + if (ZSTD_isError(cSize)) { + printf("ERROR: %s\n", ZSTD_getErrorName(cSize)); + goto cleanup; + } @@ - return 1; + goto cleanup_error; @@ - ZSTD_freeCCtx(zc); - free(src); - free(dst); - free(val); - return 0; +cleanup: + ZSTD_freeCCtx(zc); + free(src); + free(dst); + free(val); + return 0; + +cleanup_error: + ZSTD_freeCCtx(zc); + free(src); + free(dst); + free(val); + return 1;Also consider using
%zu(or an explicit cast) forsrcSize/cSizein theprintf()calls to be strictly portable withsize_t.contrib/seqBench/seqBench.c (2)
8-15: Avoid leakingZSTD_CCtxon bad CLI usage
ZSTD_CCtx* zc = ZSTD_createCCtx();is called before checkingargc, so on a usage error you return without freeing the context.A simple fix is to create the context only after validating arguments, e.g.:
-int main(int argc, char *argv[]) { - ZSTD_CCtx* zc = ZSTD_createCCtx(); - - if (argc != 2) { +int main(int argc, char *argv[]) { + if (argc != 2) { printf("Usage: seqBench <file>\n"); // TODO provide the block delim option here return 1; } + + ZSTD_CCtx* zc = ZSTD_createCCtx(); + if (zc == NULL) { + fprintf(stderr, "ERROR: failed to create ZSTD_CCtx\n"); + return 1; + }
40-52: Add cleanup on exit to avoid leaks and ensure predictable teardown
inBuf,seqs,outBuf,validationBuf, andzcare never freed. Even though this is a one‑shot CLI, it’s straightforward to add a common cleanup section that frees these buffers and the context on both success and failure paths.For example:
- if (memcmp(inBuf, validationBuf, inBufSize) == 0) { - printf("Compression and decompression were successful!\n"); - } else { - printf("ERROR: input and validation buffers don't match!\n"); - ... - } - - return 0; + int exitCode = 0; + if (memcmp(inBuf, validationBuf, inBufSize) == 0) { + printf("Compression and decompression were successful!\n"); + } else { + printf("ERROR: input and validation buffers don't match!\n"); + ... + exitCode = 1; + } + + ZSTD_freeCCtx(zc); + free(inBuf); + free(seqs); + free(outBuf); + free(validationBuf); + return exitCode;contrib/largeNbDicts/largeNbDicts.c (1)
682-785: FreespeedPerRoundand check allocations inbenchMem()
benchMem()allocatesspeedPerRoundand aBMK_timedFnState_tbut never freesspeedPerRoundand doesn’t check for allocation failures:double *const speedPerRound = (double *)malloc(nbRounds * sizeof(double)); BMK_timedFnState_t* const benchState = BMK_createTimedFnState(total_time_ms, ms_per_round);For a long‑running benchmark process or repeated invocations this accumulates a leak, and on OOM
benchState/speedPerRoundbeingNULLwill cause crashes.Consider:
- double *const speedPerRound = (double *)malloc(nbRounds * sizeof(double)); - - BMK_timedFnState_t* const benchState = - BMK_createTimedFnState(total_time_ms, ms_per_round); + double *const speedPerRound = (double *)malloc(nbRounds * sizeof(double)); + if (!speedPerRound) { + return 1; + } + + BMK_timedFnState_t* const benchState = + BMK_createTimedFnState(total_time_ms, ms_per_round); + if (!benchState) { + free(speedPerRound); + return 1; + } @@ - freeDecompressInstructions(di); - freeCompressInstructions(ci); - BMK_freeTimedFnState(benchState); - - return 0; /* success */ + freeDecompressInstructions(di); + freeCompressInstructions(ci); + BMK_freeTimedFnState(benchState); + free(speedPerRound); + + return 0; /* success */programs/util.c (1)
1147-1155: Fix trailing separator stripping inconvertPathnameToDirName.The loop currently checks
pathname[len], so it never strips trailing separators and won't handle multiple trailing PATH_SEPs as intended.Suggested fix:
- /* remove trailing '/' chars */ - len = strlen(pathname); - assert(len > 0); - while (pathname[len] == PATH_SEP) { - pathname[len] = '\0'; - len--; - } - if (len == 0) return; + /* remove trailing PATH_SEP chars */ + len = strlen(pathname); + if (len == 0) return; + while (len > 0 && pathname[len - 1] == PATH_SEP) { + pathname[--len] = '\0'; + } + if (len == 0) return;tests/bigdict.c (1)
77-81: Route all failure paths throughcleanupto avoid leaks inmain.After introducing
_exit_codeandcleanup, several error paths still usereturn 1;, leakingbuffer,out,roundtrip,cctx, anddctxon failure.Suggested fix (pattern applied to all error branches):
@@ - if (!buffer || !out || !roundtrip || !cctx || !dctx) { - fprintf(stderr, "Allocation failure\n"); - _exit_code = 1; - goto cleanup; - } + if (!buffer || !out || !roundtrip || !cctx || !dctx) { + fprintf(stderr, "Allocation failure\n"); + _exit_code = 1; + goto cleanup; + } @@ - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_windowLog, 31))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, 1))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_overlapLog, 9))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_checksumFlag, 1))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_strategy, ZSTD_btopt))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_targetLength, 7))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_minMatch, 7))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_searchLog, 1))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_hashLog, 10))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_chainLog, 10))) - return 1; - - if (ZSTD_isError(ZSTD_DCtx_setParameter(dctx, ZSTD_d_windowLogMax, 31))) - return 1; + if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_windowLog, 31)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, 1)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_overlapLog, 9)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_checksumFlag, 1)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_strategy, ZSTD_btopt)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_targetLength, 7)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_minMatch, 7)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_searchLog, 1)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_hashLog, 10)) || + ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_chainLog, 10)) || + ZSTD_isError(ZSTD_DCtx_setParameter(dctx, ZSTD_d_windowLogMax, 31))) { + _exit_code = 1; + goto cleanup; + } @@ - fprintf(stderr, "Compressing 1 GB\n"); - if (compress(cctx, dctx, out, outSize, buffer, dataSize, roundtrip, ZSTD_e_continue)) - return 1; + fprintf(stderr, "Compressing 1 GB\n"); + if (compress(cctx, dctx, out, outSize, buffer, dataSize, roundtrip, ZSTD_e_continue)) { + _exit_code = 1; + goto cleanup; + } @@ - fprintf(stderr, "Compressing 1 GB\n"); - if (compress(cctx, dctx, out, outSize, buffer, dataSize, roundtrip, ZSTD_e_end)) - return 1; + fprintf(stderr, "Compressing 1 GB\n"); + if (compress(cctx, dctx, out, outSize, buffer, dataSize, roundtrip, ZSTD_e_end)) { + _exit_code = 1; + goto cleanup; + } @@ - fprintf(stderr, "Success!\n"); - - goto cleanup; + fprintf(stderr, "Success!\n"); + _exit_code = 0; + goto cleanup;Also applies to: 83-105, 104-105, 110-121
programs/dibio.c (1)
214-230: HardenDiB_findMaxMem()against pathologicalmallocfailures
DiB_findMaxMem()assumes that at some (possibly small) size,malloc(requiredMem)will succeed. On implementations wheremalloc(0)may returnNULLand very small allocations can also fail, therequiredMem -= step;loop can underflow and spin for a long time.You can guard against this without changing normal behavior:
static size_t DiB_findMaxMem(unsigned long long requiredMem) { size_t const step = 8 MB; void* testmem = NULL; requiredMem = (((requiredMem >> 23) + 1) << 23); requiredMem += step; if (requiredMem > g_maxMemory) requiredMem = g_maxMemory; - while (!testmem) { - testmem = malloc((size_t)requiredMem); - requiredMem -= step; - } + while (!testmem && requiredMem >= step) { + testmem = malloc((size_t)requiredMem); + if (testmem) break; + requiredMem -= step; + } + if (!testmem) { + /* Fall back to minimal allocation failure handling */ + EXM_THROW(12, "not enough memory for dictionary training"); + } free(testmem); return (size_t)requiredMem; }This preserves the probing behavior while avoiding potential underflow / infinite-loop edge cases.
tests/largeDictionary.c (1)
63-128: Unify error handling with a cleanup path to avoid leaks on failure
main()returns early on allocation or parameter errors (Lines 76–103, 111–118) without freeingbuffer,out,roundtrip, or the ZSTD contexts, unliketests/bigdict.cwhich centralizes cleanup via acleanup:label.While leaks are limited to this short-lived test process and only occur on failures, you can easily make it consistent and safer:
-int main(int argc, const char** argv) -{ - ZSTD_CCtx* cctx = ZSTD_createCCtx(); - ZSTD_DCtx* dctx = ZSTD_createDCtx(); +int main(int argc, const char** argv) +{ + ZSTD_CCtx* cctx = ZSTD_createCCtx(); + ZSTD_DCtx* dctx = ZSTD_createDCtx(); @@ - if (!buffer || !out || !roundtrip || !cctx || !dctx) { - fprintf(stderr, "Allocation failure\n"); - return 1; - } - - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_windowLog, 31))) - return 1; - if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, 1))) - return 1; - ... - if (ZSTD_isError(ZSTD_DCtx_setParameter(dctx, ZSTD_d_windowLogMax, 31))) - return 1; + if (!buffer || !out || !roundtrip || !cctx || !dctx) { + fprintf(stderr, "Allocation failure\n"); + goto cleanup; + } + + if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_windowLog, 31))) + goto cleanup; + if (ZSTD_isError(ZSTD_CCtx_setParameter(cctx, ZSTD_c_nbWorkers, 1))) + goto cleanup; + ... + if (ZSTD_isError(ZSTD_DCtx_setParameter(dctx, ZSTD_d_windowLogMax, 31))) + goto cleanup; @@ - fprintf(stderr, "Success!\n"); - - free(roundtrip); - free(out); - free(buffer); - ZSTD_freeDCtx(dctx); - ZSTD_freeCCtx(cctx); - return 0; + fprintf(stderr, "Success!\n"); + +cleanup: + free(roundtrip); + free(out); + free(buffer); + ZSTD_freeDCtx(dctx); + ZSTD_freeCCtx(cctx); + return 0;This mirrors the existing pattern in
tests/bigdict.cand keeps future changes safer.lib/dictBuilder/fastcover.c (1)
313-358: PreservedisplayLevelinside FASTCOVER_ctx_t
FASTCOVER_ctx_init()assignsctx->displayLevel = displayLevel;and then immediately callsmemset(ctx, 0, sizeof(*ctx));, wiping the value. Later, other functions readctx->displayLevelto driveDISPLAYLEVELlogging and to propagatezParams.notificationLevelin the optimize path.It doesn’t affect correctness, but it does silently disable per‑context logging and propagates
notificationLevel = 0from these contexts.A small reordering would fix this:
- ctx->displayLevel = displayLevel; - /* Checks */ - if (totalSamplesSize < MAX(d, sizeof(U64)) || - totalSamplesSize >= (size_t)FASTCOVER_MAX_SAMPLES_SIZE) { - ... - } - ... - /* Zero the context */ - memset(ctx, 0, sizeof(*ctx)); + /* Zero the context */ + memset(ctx, 0, sizeof(*ctx)); + ctx->displayLevel = displayLevel; + /* Checks */ + if (totalSamplesSize < MAX(d, sizeof(U64)) || + totalSamplesSize >= (size_t)FASTCOVER_MAX_SAMPLES_SIZE) { + ... + }If you care about logging here, grep for
ctx.displayLevelusage and confirm after the change that messages appear at the expected verbosity.programs/benchzstd.c (1)
783-888: Use BMK_benchMemAdvanced’s allocation guard to keep behavior robust
BMK_benchMemAdvanced()already has a comprehensiveallocationincompleteguard and cleans up all allocations on failure. With the fix above, that remains correct. Just ensure any changes to chunking logic don’t add new allocations insideBMK_benchMemAdvancedNoAlloc()without matching error handling and keeping this single exit point pattern.If you evolve this further (e.g., more modes), consider extracting a small helper struct for per-chunk metadata to keep the allocation/cleanup patterns centralized.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
contrib/diagnose_corruption/check_flipped_bits.c(1 hunks)contrib/externalSequenceProducer/main.c(1 hunks)contrib/largeNbDicts/largeNbDicts.c(1 hunks)contrib/pzstd/Pzstd.h(1 hunks)contrib/pzstd/utils/Buffer.h(1 hunks)contrib/recovery/recover_directory.c(1 hunks)contrib/seekable_format/zstdseek_compress.c(1 hunks)contrib/seekable_format/zstdseek_decompress.c(1 hunks)contrib/seqBench/seqBench.c(1 hunks)doc/educational_decoder/harness.c(1 hunks)doc/educational_decoder/zstd_decompress.c(1 hunks)lib/dictBuilder/cover.c(1 hunks)lib/dictBuilder/divsufsort.c(1 hunks)lib/dictBuilder/fastcover.c(1 hunks)lib/dictBuilder/zdict.c(1 hunks)lib/legacy/zstd_v01.c(1 hunks)programs/benchfn.c(1 hunks)programs/benchzstd.c(1 hunks)programs/datagen.c(1 hunks)programs/dibio.c(1 hunks)programs/fileio_asyncio.c(1 hunks)programs/util.c(1 hunks)tests/bigdict.c(1 hunks)tests/largeDictionary.c(1 hunks)zlibWrapper/gzlib.c(1 hunks)zlibWrapper/gzread.c(1 hunks)zlibWrapper/zstd_zlibwrapper.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (21)
contrib/diagnose_corruption/check_flipped_bits.c (2)
lib/legacy/zstd_v03.c (3)
ZSTD_freeDCtx(2989-2993)ZSTD_createDCtx(2981-2987)ZSTD_isError(2373-2373)lib/legacy/zstd_v04.c (3)
ZSTD_freeDCtx(2477-2481)ZSTD_createDCtx(2469-2475)ZSTD_isError(2429-2429)
programs/dibio.c (2)
programs/util.c (1)
UTIL_getFileSize(545-558)lib/dictBuilder/zdict.c (3)
ZDICT_trainFromBuffer_legacy(1088-1108)ZDICT_isError(98-98)ZDICT_getErrorName(100-100)
lib/dictBuilder/cover.c (1)
lib/legacy/zstd_v07.c (6)
U64(283-286)U64(307-323)U64(358-364)U32(278-281)U32(293-305)U32(349-355)
contrib/largeNbDicts/largeNbDicts.c (2)
programs/util.c (6)
UTIL_getFileSize(545-558)UTIL_isDirectory(357-365)UTIL_getTotalFileSize(630-645)UTIL_createExpandedFNT(1379-1423)UTIL_assembleFileNamesTable(811-815)UTIL_freeFileNamesTable(817-823)programs/benchfn.c (6)
BMK_createTimedFnState(160-166)BMK_benchTimedFn(211-256)BMK_isSuccessful_runOutcome(64-67)BMK_extract_runTime(71-75)BMK_isCompleted_TimedFn(200-203)BMK_freeTimedFnState(168-168)
lib/dictBuilder/fastcover.c (2)
lib/dictBuilder/cover.c (14)
COVER_sum(278-285)COVER_computeEpochs(732-747)COVER_dictSelectionError(1035-1037)COVER_selectDict(1047-1132)COVER_dictSelectionIsError(1039-1041)COVER_best_finish(980-1024)COVER_dictSelectionFree(1043-1045)size_t(807-862)size_t(1195-1331)COVER_warnOnSmallCorpus(716-730)COVER_best_init(922-931)COVER_best_destroy(950-960)COVER_best_start(966-973)COVER_best_wait(936-945)lib/dictBuilder/zdict.c (1)
ZDICT_finalizeDictionary(862-941)
zlibWrapper/gzlib.c (1)
zlibWrapper/gzread.c (9)
z_size_t(292-366)ZEXPORT(369-398)ZEXPORT(401-425)ZEXPORT(445-470)ZEXPORT(472-474)ZEXPORT(477-531)ZEXPORT(534-591)ZEXPORT(594-609)ZEXPORT(612-637)
contrib/pzstd/Pzstd.h (1)
contrib/pzstd/utils/Buffer.h (1)
pzstd(17-99)
contrib/seqBench/seqBench.c (2)
contrib/externalSequenceProducer/main.c (1)
main(29-107)doc/educational_decoder/zstd_decompress.c (1)
ZSTD_decompress(384-391)
zlibWrapper/gzread.c (1)
zlibWrapper/gzlib.c (16)
ZEXPORT(265-267)ZEXPORT(270-272)ZEXPORT(275-289)ZEXPORT(293-295)ZEXPORT(299-320)ZEXPORT(323-341)ZEXPORT(344-414)ZEXPORT(417-422)ZEXPORT(425-437)ZEXPORT(440-445)ZEXPORT(448-466)ZEXPORT(469-474)ZEXPORT(477-489)ZEXPORT(492-507)ZEXPORT(510-526)gzFile(90-262)
doc/educational_decoder/zstd_decompress.c (1)
lib/legacy/zstd_v03.c (1)
ZSTD_decompress(2905-2910)
tests/largeDictionary.c (2)
tests/bigdict.c (2)
compress(21-61)main(63-133)programs/datagen.c (1)
RDG_genBuffer(144-152)
contrib/recovery/recover_directory.c (2)
programs/util.c (1)
UTIL_getFileSize(545-558)lib/legacy/zstd_v03.c (3)
ZSTD_isError(2373-2373)ZSTD_createDCtx(2981-2987)ZSTD_freeDCtx(2989-2993)
tests/bigdict.c (2)
tests/largeDictionary.c (2)
compress(21-61)main(63-128)programs/datagen.c (1)
RDG_genBuffer(144-152)
doc/educational_decoder/harness.c (1)
doc/educational_decoder/zstd_decompress.c (5)
ZSTD_get_decompressed_size(1378-1401)create_dictionary(1405-1411)parse_dictionary(1435-1487)ZSTD_decompress_with_dict(393-409)free_dictionary(1414-1425)
programs/fileio_asyncio.c (1)
contrib/seekable_format/zstdseek_decompress.c (1)
LONG_SEEK(91-106)
contrib/externalSequenceProducer/main.c (1)
contrib/seqBench/seqBench.c (1)
main(8-53)
programs/benchzstd.c (5)
lib/legacy/zstd_v03.c (8)
ZSTD_isError(2373-2373)U32(121-124)U32(161-164)U32(166-175)U64(126-129)U64(177-187)ZSTD_createDCtx(2981-2987)ZSTD_freeDCtx(2989-2993)lib/legacy/zstd_v04.c (8)
ZSTD_isError(2429-2429)U32(97-100)U32(137-140)U32(142-151)U64(102-105)U64(154-164)ZSTD_createDCtx(2469-2475)ZSTD_freeDCtx(2477-2481)programs/datagen.c (1)
RDG_genBuffer(144-152)programs/benchfn.c (6)
BMK_benchTimedFn(211-256)BMK_isSuccessful_runOutcome(64-67)BMK_extract_runTime(71-75)BMK_isCompleted_TimedFn(200-203)BMK_createTimedFnState(160-166)BMK_freeTimedFnState(168-168)programs/util.c (3)
UTIL_getFileSize(545-558)UTIL_isDirectory(357-365)UTIL_getTotalFileSize(630-645)
contrib/seekable_format/zstdseek_compress.c (1)
lib/legacy/zstd_v04.c (6)
ZSTD_isError(2429-2429)U32(97-100)U32(137-140)U32(142-151)BYTE(839-848)BYTE(850-859)
lib/dictBuilder/zdict.c (1)
lib/legacy/zstd_v03.c (4)
ERR_isError(525-525)U32(121-124)U32(161-164)U32(166-175)
contrib/seekable_format/zstdseek_decompress.c (2)
lib/legacy/zstd_v03.c (4)
ZSTD_isError(2373-2373)U32(121-124)U32(161-164)U32(166-175)lib/legacy/zstd_v05.c (2)
U32(124-127)U32(170-178)
zlibWrapper/zstd_zlibwrapper.c (2)
lib/legacy/zstd_v03.c (8)
char(527-532)ZSTD_isError(2373-2373)void(131-134)void(147-159)void(383-386)void(651-658)void(2914-2918)ZSTD_decompress(2905-2910)zlibWrapper/gzread.c (4)
int(32-54)int(63-86)int(97-173)int(180-224)
🪛 ast-grep (0.39.9)
zlibWrapper/gzlib.c
[warning] 239-239: This call makes a world-writable file which allows any user on a machine to write to the file. This may allow attackers to influence the behaviour of this process by writing to the file.
Context: 0666
Note: [CWE-732]: Incorrect Permission Assignment for Critical Resource [REFERENCES]
- https://wiki.sei.cmu.edu/confluence/display/c/FIO06-C.+Create+files+with+appropriate+access+permissions
(world-writable-file-c)
🔇 Additional comments (12)
contrib/pzstd/utils/Buffer.h (1)
50-62: Document the precondition and exception behavior forBuffer::splitAt.Verification reveals that the underlying
Range::subpiece()andRange::advance()methods already validate bounds and throwstd::out_of_rangeifn > size(). Therefore, unlike the original concern suggested, undefined behavior does not occur—exceptions are thrown instead.However, this exception behavior is not documented in the
splitAtmethod comment. Consider either:
- (a) Documenting the precondition
n <= size()and thatstd::out_of_rangeis thrown on violation, or- (b) Adding a defensive check/assert in
splitAtfor consistency with callers' expectations.zlibWrapper/gzread.c (2)
19-27: Union-basedgz_statepusage and core read path look consistentThe shift from pointer casting to a
gz_statepunion (state.filevsstate.state) is applied consistently across helpers and the publicgzread/gzfreadAPIs: every function that accepts agzFilesetsstate.file = filebefore dereferencingstate.state, and internal helpers only ever touchstate.state->.... The lazily-initialized buffers and stream state ingz_look/gz_readappear faithful to upstream zlib semantics, including theZ_OK/Z_BUF_ERRORhandling.No changes needed here from a correctness perspective.
Also applies to: 75-87, 291-398
136-149: Confirmzstdmagic handling matches your wrapper’sinflatesemanticsYou now treat the
0x28 0xB5prefix as a “zstd header” and route it through the same GZIP path (resetting theinflatestream and settinghow = GZIP). This assumes your zlibWrapper’sinflate*implementation knows how to handle zstd frames under the same entry points.Please double‑check that:
inflateInit2(strm, 15 + 16)is compatible with your zstd-backed inflate, and- mixed inputs (pure zstd, gzip, or trailing raw data) behave as expected.
If this code was copied from the upstream zstd zlibWrapper, it’s probably fine, but it’s worth re-verifying against current wrapper behavior and tests.
zlibWrapper/gzlib.c (3)
75-87: State initialization, reset, and union pattern align with upstream behavior
gz_reset,gz_open, and the buffer configuration (gzbuffer) correctly operate on the union-basedgz_statepby consistently usingstate.state->...after constructingstate.stateor settingstate.file. The initialization sequence (allocate state, parse mode, set defaults, buildpath, open fd, setstart, thengz_reset) matches canonical zlib logic, andgzbufferenforces the same pre-allocation and overflow constraints.This module’s core state lifecycle looks sound.
Also applies to: 89-262, 298-341
343-474: Seek/offset APIs and error propagation appear correct and consistentThe seek/tell/offset helpers (
gzseek64/gzseek,gztell64/gztell,gzoffset64/gzoffset) and error/EoF handling (gzeof,gzerror,gzclearerr,gz_error) mirror standard zlib behavior: they normalize offsets, manage pending skips, correctly adjust buffer cursors, and preserveZ_BUF_ERRORvs. fatal errors.gz_erroralso correctly owns and frees dynamically allocated messages, including the special handling forZ_MEM_ERROR.No changes required here.
Also applies to: 476-526, 528-569
235-241: Review is technically valid but contextual—verify usage and zstd's compatibility constraintsThe 0666 file mode is indeed less secure than the temporary-restrictive pattern used elsewhere in zstd (e.g.,
programs/fileio.cuses 0600 then fchmod). However, this is inherited from upstream zlib behavior and the wrapper is explicitly designed for backward compatibility.Key findings:
zlibWrapper/gzlib.chas "minimal changes required" from upstream zlib; gzip file access is a documented supported feature- The same codebase demonstrates better practices (temporary 0600 + post-write fchmod in fileio.c)
- No callers to gzip file creation functions found outside the wrapper, suggesting limited usage
- No existing security documentation or config for file permissions in the wrapper
Before deciding to implement the review's suggestions (document umask or add a config hook), verify:
- Is this code path actually used? Evidence suggests it may not be in typical zstd workflows.
- Are compatibility constraints strict? If the wrapper must maintain zlib API/behavior exactly, the current approach is intentional.
- Does zstd have explicit security requirements for this optional feature?
If file creation is used and security is a concern, the review's suggestions are reasonable and non-breaking (documentation + optional stricter default are additive). If rarely used, lower priority.
programs/datagen.c (1)
144-186: Deterministic data generator looks consistent and safe
RDG_genBuffer()andRDG_genStdout()are well-bounded, seedable, and avoid obvious overflow or aliasing issues. Memory is checked and freed correctly. No changes needed.lib/dictBuilder/zdict.c (1)
862-941: Dictionary finalization layout looks consistent and safe
ZDICT_finalizeDictionary()correctly enforces capacity checks, clampsdictContentSize, pads up to the maximum repcode distance, and then lays out[header][padding][content]while handling potential overlap viamemmove(). This matches the expectations of the COVER/FASTCOVER callers and ZSTD’s dictionary loader.If you want to double‑check behavior against your zstd version, you can diff this function with the vendor source to ensure no local drift.
contrib/seekable_format/zstdseek_compress.c (1)
268-351: Seek-table streaming logic appears correctThe
ZSTD_seekable_seekTableSize(),ZSTD_stwrite32(), andZSTD_seekable_writeSeekTable()trio correctly:
- Compute the per-frame entry size based on
checksumFlag.- Support partial writes by tracking
seekTablePos/seekTableIndexand returning “bytes remaining” when the output buffer is too small.- Emit the skippable header, per-frame entries, footer count, SFD byte, and trailer magic in a resumable way.
This matches the intended non-blocking API where the caller can keep calling until the return value reaches 0.
Consider adding a small unit test that repeatedly calls
ZSTD_seekable_writeSeekTable()with tiny output buffers (e.g., 1–2 bytes) to validate the resumability contract end-to-end.doc/educational_decoder/zstd_decompress.c (2)
384-409: Top-level decompression API is straightforward and consistent
ZSTD_decompress()andZSTD_decompress_with_dict()correctly:
- Wrap the raw pointers in
istream_t/ostream_tfor bounds-checked IO.- Handle an optional parsed dictionary.
- Decode exactly one frame via
decode_frame()and return the number of bytes written.For an educational decoder that intentionally
exit(1)s on error, this is clean and easy to follow, and matches the documented single-frame assumption.If you plan to expose this beyond educational use, consider swapping the
ERROR()macro to propagate error codes instead of callingexit(1), and adding a small test to validate dictionary vs non-dictionary behavior.
1405-1580: Dictionary parsing & application follow the spec and look safeThe dictionary parsing logic:
- Differentiates between raw-content and formatted dictionaries via the magic (0xEC30A437).
- Validates minimum size, recent offsets
< dict size, and decodes entropy tables using the same HUF/FSE primitives as the frame decoder.- Applies dictionaries in
frame_context_apply_dict()only when present and matching the requesteddictionary_id, deep-copying tables so the dictionary object and frame context have independent lifetimes.Given the educational scope, this is a solid and faithful implementation of the spec.
If you plan to fuzz this decoder, be sure to include both raw and formatted dictionaries with edge-case sizes to confirm the bounds checks and offset validations behave as expected.
contrib/seekable_format/zstdseek_decompress.c (1)
256-277: Good defensive checks when cloning the seek tableThe added
assert(zs != NULL)andif (zs->seekTable.entries == NULL) return NULL;guard against misuse and avoid dereferencing a nullentriespointer before cloning. Copying the extra sentinel entry (tableLen + 1) keeps the cloned table consistent with the original sizing logic.No changes needed here.
| ZSTD_CCtx* const zc = ZSTD_createCCtx(); | ||
|
|
||
| int simpleSequenceProducerState = 0xdeadbeef; | ||
|
|
||
| // Here is the crucial bit of code! | ||
| ZSTD_registerSequenceProducer( | ||
| zc, | ||
| &simpleSequenceProducerState, | ||
| simpleSequenceProducer | ||
| ); | ||
|
|
||
| { | ||
| size_t const res = ZSTD_CCtx_setParameter(zc, ZSTD_c_enableSeqProducerFallback, 1); | ||
| CHECK(res); |
There was a problem hiding this comment.
🧩 Analysis chain
Check CCtx creation and sequence‑producer registration results
ZSTD_createCCtx() is used without checking for NULL, and ZSTD_registerSequenceProducer()’s return value is ignored. On allocation failure or API error this will lead to later dereferences of an invalid context or silently continuing after a failed registration.
Consider validating both and bailing out (with cleanup) on error, e.g.:
- ZSTD_CCtx* const zc = ZSTD_createCCtx();
+ ZSTD_CCtx* const zc = ZSTD_createCCtx();
+ if (zc == NULL) {
+ fprintf(stderr, "ERROR: failed to create ZSTD_CCtx\n");
+ return 1;
+ }
@@
- ZSTD_registerSequenceProducer(
- zc,
- &simpleSequenceProducerState,
- simpleSequenceProducer
- );
+ {
+ size_t const res = ZSTD_registerSequenceProducer(
+ zc,
+ &simpleSequenceProducerState,
+ simpleSequenceProducer);
+ CHECK(res);
+ }
🏁 Script executed:
#!/bin/bash
# Read the file and extract the relevant section
cat contrib/externalSequenceProducer/main.c | head -60 | tail -35Length of output: 895
🏁 Script executed:
#!/bin/bash
# Get more context: find the function containing these lines and see cleanup
cat contrib/externalSequenceProducer/main.cLength of output: 2951
🏁 Script executed:
#!/bin/bash
# Check if ZSTD_registerSequenceProducer is used elsewhere in the file
rg -n "registerSequenceProducer" contrib/externalSequenceProducer/main.cLength of output: 37
🏁 Script executed:
#!/bin/bash
# Check for CHECK macro definition and error handling patterns
rg -n -B2 -A2 "CHECK|ZSTD_createCCtx|ZSTD_registerSequenceProducer" contrib/externalSequenceProducer/main.c | head -80Length of output: 922
🏁 Script executed:
#!/bin/bash
# Look for resource cleanup: search for free() and ZSTD_freeCCtx() calls
rg -n "free|ZSTD_free" contrib/externalSequenceProducer/main.cLength of output: 83
🏁 Script executed:
#!/bin/bash
# Verify ZSTD API: check if there's any documentation or header hints
fd -e h "zstd" --search-path contrib/ --search-path "." 2>/dev/null | head -10Length of output: 337
Add NULL check for ZSTD_createCCtx() and verify ZSTD_registerSequenceProducer() return value
Both issues are confirmed in the code:
-
Line 35:
ZSTD_createCCtx()is called without NULL checking. If allocation fails, the NULL pointer is dereferenced at line 40. -
Lines 40-44:
ZSTD_registerSequenceProducer()return value is ignored. A registration failure is not detected and silently continues.
Additionally, note that cleanup at lines 102-105 is only reachable on the success path. If any CHECK fails before that point (e.g., line 48), zc will leak.
The proposed diff in the review is incomplete—it wraps registerSequenceProducer in CHECK but omits the NULL check for createCCtx(). Both should be added:
ZSTD_CCtx* const zc = ZSTD_createCCtx();
if (zc == NULL) {
fprintf(stderr, "ERROR: failed to create ZSTD_CCtx\n");
return 1;
}
// ... later ...
{
size_t const res = ZSTD_registerSequenceProducer(
zc,
&simpleSequenceProducerState,
simpleSequenceProducer);
CHECK(res);
}🤖 Prompt for AI Agents
In contrib/externalSequenceProducer/main.c around lines 35–48 (and consider
cleanup at 102–105), ZSTD_createCCtx() is not NULL-checked and
ZSTD_registerSequenceProducer() return value is ignored; add an immediate NULL
check after ZSTD_createCCtx() that prints an error and returns non-zero if
allocation fails, wrap the ZSTD_registerSequenceProducer() call and capture its
return value then CHECK() it (or handle the error), and ensure any early returns
free the context (call ZSTD_freeCCtx or equivalent) to avoid leaking zc when
later CHECKs fail.
| FILE *f = fopen(argv[1], "rb"); | ||
| assert(f); | ||
| { | ||
| int const ret = fseek(f, 0, SEEK_END); | ||
| assert(ret == 0); | ||
| } | ||
| size_t const srcSize = ftell(f); | ||
| { | ||
| int const ret = fseek(f, 0, SEEK_SET); | ||
| assert(ret == 0); | ||
| } | ||
|
|
||
| char* const src = malloc(srcSize + 1); | ||
| assert(src); | ||
| { | ||
| size_t const ret = fread(src, srcSize, 1, f); | ||
| assert(ret == 1); | ||
| int const ret2 = fclose(f); | ||
| assert(ret2 == 0); | ||
| } | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
Don’t rely on assert() for runtime I/O/allocation failures
fopen(), fseek(), ftell(), malloc(), fread(), and fclose() are only guarded by assert(). In release builds with NDEBUG defined these asserts disappear, and failures (e.g., missing file, OOM, short read) will lead to undefined behavior (e.g., calling fseek() on a NULL FILE*, using uninitialized buffers, or mis‑sized allocations).
Prefer explicit checks and graceful exits, for example:
- FILE *f = fopen(argv[1], "rb");
- assert(f);
+ FILE *f = fopen(argv[1], "rb");
+ if (!f) {
+ perror("fopen");
+ ZSTD_freeCCtx(zc);
+ return 1;
+ }
@@
- size_t const srcSize = ftell(f);
+ long const tellRes = ftell(f);
+ if (tellRes < 0) {
+ perror("ftell");
+ fclose(f);
+ ZSTD_freeCCtx(zc);
+ return 1;
+ }
+ size_t const srcSize = (size_t)tellRes;
@@
- char* const src = malloc(srcSize + 1);
- assert(src);
+ char* const src = (char*)malloc(srcSize);
+ if (!src) {
+ fclose(f);
+ ZSTD_freeCCtx(zc);
+ return 1;
+ }
@@
- size_t const ret = fread(src, srcSize, 1, f);
- assert(ret == 1);
+ size_t const ret = fread(src, 1, srcSize, f);
+ if (ret != srcSize) {
+ fprintf(stderr, "ERROR: short read\n");
+ fclose(f);
+ free(src);
+ ZSTD_freeCCtx(zc);
+ return 1;
+ }I'll verify the review comment by examining the actual code and context in the repository.
🏁 Script executed:
#!/bin/bash
# Find and examine the file
fd -t f "main.c" | grep -i "externalSequenceProducer"Length of output: 39
🏁 Script executed:
#!/bin/bash
# Extract the full context around lines 51-71, plus surrounding code
if [ -f "contrib/externalSequenceProducer/main.c" ]; then
echo "=== Full file context (lines 1-100) ==="
head -n 100 "contrib/externalSequenceProducer/main.c" | cat -n
echo ""
echo "=== Checking for ZSTD_freeCCtx and zc variable ==="
grep -n "ZSTD_freeCCtx\|zc" "contrib/externalSequenceProducer/main.c" | head -20
fiLength of output: 3919
🏁 Script executed:
#!/bin/bash
# Check the entire file for assert usage and similar error handling patterns
if [ -f "contrib/externalSequenceProducer/main.c" ]; then
echo "=== All assert() calls in file ==="
grep -n "assert" "contrib/externalSequenceProducer/main.c"
fiLength of output: 252
I need to see the complete file to verify the full cleanup path and ensure the review's suggested fixes are complete.
🏁 Script executed:
#!/bin/bash
# Get the complete file content
wc -l "contrib/externalSequenceProducer/main.c"
echo ""
echo "=== Full file content ==="
cat "contrib/externalSequenceProducer/main.c"Length of output: 3022
Replace assert()-based error checks with explicit error handling to prevent undefined behavior in release builds.
The code relies entirely on assert() for runtime error handling at lines 52, 55, 60, 64, 67, 69, 74, and 80. With NDEBUG defined (standard in release builds), these asserts vanish, causing undefined behavior:
fopen()returning NULL leads tofseek()called on invalid pointermalloc()failures go undetected, using NULL pointersfread()short reads undetected
Additionally, there's a critical resource leak at line 99: the early return 1; on memcmp mismatch skips cleanup of allocated memory and the compression context, even in the current code.
Implement explicit checks with proper cleanup paths for all I/O and allocation operations (lines 51–70), and ensure all error paths properly free resources before returning.
| /* benched function */ | ||
| size_t compress(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* payload) | ||
| { | ||
| compressInstructions* const ci = (compressInstructions*) payload; | ||
| (void)dstCapacity; | ||
|
|
||
| ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]); | ||
| ZSTD_compress2(ci->cctx, | ||
| dst, srcSize, | ||
| src, srcSize); | ||
|
|
||
| ci->dictNb = ci->dictNb + 1; | ||
| if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0; | ||
|
|
||
| return srcSize; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
Fix compress() to honor dstCapacity and propagate ZSTD errors
compress() currently ignores the dstCapacity parameter and calls:
ZSTD_compress2(ci->cctx,
dst, srcSize,
src, srcSize);
...
return srcSize;Issues:
srcSizeis used as the destination capacity, even though the caller computed a larger capacity (dstBlocks.capacities[...]), so you can getZSTD_error_dstSize_tooSmalldespite having reserved enough space.- The return value from
ZSTD_compress2()is ignored, soBMK_benchFunction’serrorFn(ZSTD_isError) never sees failures. - Returning
srcSizeunconditionally hides compression errors.
A safe pattern that preserves the “bytes processed per run” semantics of BMK_runOutcome_t.sumOfReturn:
size_t compress(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* payload)
{
compressInstructions* const ci = (compressInstructions*) payload;
- (void)dstCapacity;
-
- ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]);
- ZSTD_compress2(ci->cctx,
- dst, srcSize,
- src, srcSize);
+ ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]);
+ {
+ size_t const cSize = ZSTD_compress2(ci->cctx,
+ dst, dstCapacity,
+ src, srcSize);
+ if (ZSTD_isError(cSize)) {
+ return cSize; /* let BMK_benchFunction + ZSTD_isError() surface the failure */
+ }
+ }
@@
- return srcSize;
+ return srcSize; /* bytes of input processed per run */
}
---
🏁 Script executed:
```shell
#!/bin/bash
# Verify the compress() function code at lines 621-636
echo "=== Checking compress() function ==="
sed -n '621,636p' contrib/largeNbDicts/largeNbDicts.c
Length of output: 511
🏁 Script executed:
#!/bin/bash
# Find how dstBlocks capacities are allocated/assigned
echo "=== Search for dstBlocks capacity setup ==="
rg -n -A5 -B5 'dstBlocks.*capacit' contrib/largeNbDicts/largeNbDicts.cLength of output: 434
🏁 Script executed:
#!/bin/bash
# Find compressBlocks() to see how dst buffer capacity is determined
echo "=== Search for compressBlocks function ==="
rg -n -A20 'compressBlocks\s*\(' contrib/largeNbDicts/largeNbDicts.c | head -60Length of output: 2964
🏁 Script executed:
#!/bin/bash
# Search for BMK_benchFunction definition/usage to understand errorFn
echo "=== Search for BMK_benchFunction calls ==="
rg -n -B3 -A3 'BMK_benchFunction' contrib/largeNbDicts/largeNbDicts.cLength of output: 42
🏁 Script executed:
#!/bin/bash
# Check how compress() function signature relates to BMK_benchFunction requirements
echo "=== Search for compress function definition ==="
rg -n -B5 'size_t compress\(' contrib/largeNbDicts/largeNbDicts.cLength of output: 222
🏁 Script executed:
#!/bin/bash
# Find all ZSTD_compress2 calls in the repo to see usage patterns
echo "=== ZSTD_compress2 usage patterns ==="
rg -n -B2 -A2 'ZSTD_compress2\(' --type=c | head -40Length of output: 3306
---
🏁 Script executed:
```shell
#!/bin/bash
# Find where dstBlocks is created and how its capacities relate to srcSize
echo "=== Search for dstBlocks creation/initialization ==="
rg -n -B10 -A10 'dstBlocks.*=' contrib/largeNbDicts/largeNbDicts.c | grep -A15 -B15 'capacity\|allocate\|create' | head -80
Length of output: 831
🏁 Script executed:
#!/bin/bash
# Look for where buffers are allocated for compression/decompression benchmarking
echo "=== Search for buffer allocation in bench() function ==="
rg -n -A50 'static void bench\(' contrib/largeNbDicts/largeNbDicts.c | head -100Length of output: 56
🏁 Script executed:
#!/bin/bash
# Check if there's a ZSTD_compress2 wrapper or if errors are supposed to be handled elsewhere
echo "=== Search for decompress function to compare patterns ==="
rg -n -A15 'size_t decompress\(' contrib/largeNbDicts/largeNbDicts.c | head -40Length of output: 672
🏁 Script executed:
#!/bin/bash
# Search for how compress() is used/registered with the benchmark framework
echo "=== Search for compress function reference/usage ==="
rg -n 'compress' contrib/largeNbDicts/largeNbDicts.c | grep -i 'fn\|ptr\|func\|call'Length of output: 117
Fix compress() to honor dstCapacity and propagate ZSTD errors
The compress() function at lines 621–636 has a critical defect: it ignores the dstCapacity parameter and does not capture the return value from ZSTD_compress2().
Compare with decompress() at lines 639–652, which correctly uses dstCapacity and captures the result. The benchmark framework (line 709) expects errors to be surfaced via errorFn = ZSTD_isError, but compress() always returns srcSize unconditionally, masking all compression failures.
Specifically:
(void)dstCapacity;discards the caller-provided capacity.ZSTD_compress2(ci->cctx, dst, srcSize, src, srcSize);usessrcSizeas the destination capacity instead, riskingZSTD_error_dstSize_tooSmalleven when sufficient space was reserved.- The return value is ignored, so errors never reach the benchmark framework.
Apply the fix from the review:
size_t compress(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* payload)
{
compressInstructions* const ci = (compressInstructions*) payload;
- (void)dstCapacity;
-
- ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]);
- ZSTD_compress2(ci->cctx,
- dst, srcSize,
- src, srcSize);
+ ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]);
+ {
+ size_t const cSize = ZSTD_compress2(ci->cctx,
+ dst, dstCapacity,
+ src, srcSize);
+ if (ZSTD_isError(cSize)) {
+ return cSize; /* let BMK_benchFunction + ZSTD_isError() surface the failure */
+ }
+ }
ci->dictNb = ci->dictNb + 1;
if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0;
return srcSize; /* bytes of input processed per run */
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* benched function */ | |
| size_t compress(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* payload) | |
| { | |
| compressInstructions* const ci = (compressInstructions*) payload; | |
| (void)dstCapacity; | |
| ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]); | |
| ZSTD_compress2(ci->cctx, | |
| dst, srcSize, | |
| src, srcSize); | |
| ci->dictNb = ci->dictNb + 1; | |
| if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0; | |
| return srcSize; | |
| } | |
| /* benched function */ | |
| size_t compress(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* payload) | |
| { | |
| compressInstructions* const ci = (compressInstructions*) payload; | |
| ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]); | |
| { | |
| size_t const cSize = ZSTD_compress2(ci->cctx, | |
| dst, dstCapacity, | |
| src, srcSize); | |
| if (ZSTD_isError(cSize)) { | |
| return cSize; /* let BMK_benchFunction + ZSTD_isError() surface the failure */ | |
| } | |
| } | |
| ci->dictNb = ci->dictNb + 1; | |
| if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0; | |
| return srcSize; /* bytes of input processed per run */ | |
| } |
🤖 Prompt for AI Agents
In contrib/largeNbDicts/largeNbDicts.c around lines 621–636, compress()
currently ignores dstCapacity and discards ZSTD_compress2()'s return value;
change the call to pass the actual destination capacity and capture its return
so errors propagate to the benchmark framework: remove the "(void)dstCapacity;"
line, call ZSTD_compress2(ci->cctx, dst, dstCapacity, src, srcSize) and store
the returned size_t (e.g., cSize), then return that captured cSize (leave the
dictNb increment logic as-is) so ZSTD_isError can detect and handle failures.
| #define ZSTD_STATIC_LINKING_ONLY | ||
| #define ZSTD_DISABLE_DEPRECATE_WARNINGS /* No deprecation warnings, pzstd itself is deprecated | ||
| * and uses deprecated functions | ||
| */ | ||
| #include "zstd.h" | ||
| #undef ZSTD_STATIC_LINKING_ONLY | ||
|
|
||
| #include <cstddef> | ||
| #include <cstdint> | ||
| #include <memory> | ||
|
|
There was a problem hiding this comment.
Make the header self-contained by including <cstdio> for FILE*.
This header declares several APIs taking FILE* but doesn’t include <cstdio>/<stdio.h>. Including Pzstd.h in a TU that hasn’t already pulled in <cstdio> will fail to compile due to an unknown FILE typedef. Add the appropriate standard header alongside the other includes here.
🤖 Prompt for AI Agents
In contrib/pzstd/Pzstd.h around lines 19 to 29, the header uses FILE* in its API
but does not include <cstdio>/<stdio.h>, so including Pzstd.h in a translation
unit without that header breaks compilation; add an appropriate C stdio include
(e.g., include <cstdio>) alongside the existing includes so FILE is defined and
the header is self-contained.
| /** | ||
| * Runs pzstd with `options` and returns the number of bytes written. | ||
| * An error occurred if `errorHandler.hasError()`. | ||
| * | ||
| * @param options The pzstd options to use for (de)compression | ||
| * @returns 0 upon success and non-zero on failure. | ||
| */ | ||
| int pzstdMain(const Options& options); |
There was a problem hiding this comment.
Clarify pzstdMain’s return-value contract.
The comment says this “returns the number of bytes written”, but the @returns line states “0 upon success and non-zero on failure”. These are contradictory. Please decide whether the function returns a byte count or an error code and update the documentation (and implementation, if needed) so callers can reliably interpret the result.
🤖 Prompt for AI Agents
In contrib/pzstd/Pzstd.h around lines 31 to 38, the documentation for pzstdMain
contradicts itself by saying it "returns the number of bytes written" but the
@returns says "0 upon success and non-zero on failure"; decide which contract we
want (either return a byte count with negative/error sentinel or return an int
error code and provide bytes-written via an out-parameter or errorHandler),
update the function documentation to precisely state the chosen contract, and
modify the implementation and all call sites to match: if choosing byte-count,
make return type convey bytes and return -1 or set errorHandler on failure; if
choosing error-code, change the doc to say returns 0 on success and non-zero
error code and move bytes-written into an out parameter or accessor. Ensure the
header comment, function signature/implementation, and callers are consistent.
| if (adv->mode == BMK_decodeOnly) { | ||
| /* benchmark only decompression : source must be already compressed */ | ||
| const char* srcPtr = (const char*)srcBuffer; | ||
| U64 totalDSize64 = 0; | ||
| U32 fileNb; | ||
| for (fileNb = 0; fileNb < nbFiles; fileNb++) { | ||
| U64 const fSize64 = | ||
| ZSTD_findDecompressedSize(srcPtr, fileSizes[fileNb]); | ||
| if (fSize64 == ZSTD_CONTENTSIZE_UNKNOWN) { | ||
| RETURN_ERROR( | ||
| 32, | ||
| BMK_benchOutcome_t, | ||
| "Decompressed size cannot be determined: cannot benchmark"); | ||
| } | ||
| if (fSize64 == ZSTD_CONTENTSIZE_ERROR) { | ||
| RETURN_ERROR( | ||
| 32, | ||
| BMK_benchOutcome_t, | ||
| "Error while trying to assess decompressed size: data may be invalid"); | ||
| } | ||
| totalDSize64 += fSize64; | ||
| srcPtr += fileSizes[fileNb]; | ||
| } | ||
| { | ||
| size_t const decodedSize = (size_t)totalDSize64; | ||
| assert((U64)decodedSize == totalDSize64); /* check overflow */ | ||
| free(*resultBufferPtr); | ||
| if (totalDSize64 > decodedSize) { /* size_t overflow */ | ||
| RETURN_ERROR( | ||
| 32, | ||
| BMK_benchOutcome_t, | ||
| "decompressed size is too large for local system"); | ||
| } | ||
| *resultBufferPtr = malloc(decodedSize); | ||
| if (!(*resultBufferPtr)) { | ||
| RETURN_ERROR( | ||
| 33, | ||
| BMK_benchOutcome_t, | ||
| "allocation error: not enough memory"); | ||
| } | ||
| cSize = srcSize; | ||
| srcSize = decodedSize; | ||
| ratio = (double)srcSize / (double)cSize; | ||
| } | ||
| } | ||
|
|
||
| /* Init data chunks */ | ||
| { | ||
| const char* srcPtr = (const char*)srcBuffer; | ||
| char* cPtr = (char*)compressedBuffer; | ||
| char* resPtr = (char*)(*resultBufferPtr); | ||
| U32 fileNb, chunkID; | ||
| for (chunkID = 0, fileNb = 0; fileNb < nbFiles; fileNb++) { | ||
| size_t remaining = fileSizes[fileNb]; | ||
| U32 const nbChunksforThisFile = (adv->mode == BMK_decodeOnly) | ||
| ? 1 | ||
| : (U32)((remaining + (chunkSizeMax - 1)) / chunkSizeMax); | ||
| U32 const chunkIdEnd = chunkID + nbChunksforThisFile; | ||
| for (; chunkID < chunkIdEnd; chunkID++) { | ||
| size_t const chunkSize = MIN(remaining, chunkSizeMax); | ||
| srcPtrs[chunkID] = srcPtr; | ||
| srcSizes[chunkID] = chunkSize; | ||
| cPtrs[chunkID] = cPtr; | ||
| cCapacities[chunkID] = (adv->mode == BMK_decodeOnly) | ||
| ? chunkSize | ||
| : ZSTD_compressBound(chunkSize); | ||
| resPtrs[chunkID] = resPtr; | ||
| resSizes[chunkID] = (adv->mode == BMK_decodeOnly) | ||
| ? (size_t)ZSTD_findDecompressedSize( | ||
| srcPtr, chunkSize) | ||
| : chunkSize; | ||
| srcPtr += chunkSize; | ||
| cPtr += cCapacities[chunkID]; | ||
| resPtr += chunkSize; | ||
| remaining -= chunkSize; | ||
| if (adv->mode == BMK_decodeOnly) { | ||
| cSizes[chunkID] = chunkSize; | ||
| benchResult.cSize = chunkSize; | ||
| } | ||
| } | ||
| } | ||
| nbChunks = chunkID; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
Decode-only bench can write past resultBuffer
In BMK_benchMemAdvancedNoAlloc()’s BMK_decodeOnly path, you:
- Compute the total decompressed size (
decodedSize) and allocate*resultBufferPtraccordingly. - In the chunking loop you do:
resPtrs[chunkID] = resPtr;
resSizes[chunkID] = (size_t)ZSTD_findDecompressedSize(srcPtr, chunkSize);
...
resPtr += chunkSize; /* compressed size */Later, local_defaultDecompress() is called with dstBuffers = resPtrs and dstCapacities = resSizes, so each decompression writes resSizes[chunkID] bytes starting at resPtr that was advanced by the compressed size, not the decompressed size.
If for any file decompressedSize > compressedSize (which is very possible for small or poorly compressible inputs), resPtrs[chunkID] + resSizes[chunkID] can exceed the end of resultBuffer, leading to heap overflow.
You already computed all decompressed sizes up-front, so the layout can be fixed by advancing resPtr by the decompressed size in decode-only mode:
- resPtrs[chunkID] = resPtr;
- resSizes[chunkID] = (adv->mode == BMK_decodeOnly)
- ? (size_t)ZSTD_findDecompressedSize(
- srcPtr, chunkSize)
- : chunkSize;
+ resPtrs[chunkID] = resPtr;
+ if (adv->mode == BMK_decodeOnly) {
+ size_t const decSize =
+ (size_t)ZSTD_findDecompressedSize(srcPtr, chunkSize);
+ resSizes[chunkID] = decSize;
+ resPtr += decSize;
+ } else {
+ resSizes[chunkID] = chunkSize;
+ resPtr += chunkSize;
+ }
@@
- srcPtr += chunkSize;
- cPtr += cCapacities[chunkID];
- resPtr += chunkSize;
+ srcPtr += chunkSize;
+ cPtr += cCapacities[chunkID];(Adjust to avoid double-incrementing resPtr in the non-decode path.)
This keeps each dstBuffer slice within the single resultBuffer of size decodedSize.
You can validate this by constructing a small corpus where compressed size > decompressed size (e.g., very short inputs) and running benchzstd in decode-only mode under ASan/Valgrind to confirm the overflow disappears after the change.
🏁 Script executed:
#!/bin/bash
# Extract the relevant code section and surrounding context
cat programs/benchzstd.c | head -600 | tail -200Length of output: 7593
🏁 Script executed:
#!/bin/bash
# Search for where resPtrs and resSizes are used in decompression
rg -n "resPtrs\|resSizes" programs/benchzstd.c -A 2 -B 2Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for decompression function calls that use these buffers
rg -n "ZSTD_decompress|dstBuffers|local_defaultDecompress" programs/benchzstd.c -C 3Length of output: 1273
🏁 Script executed:
#!/bin/bash
# Find the function signature and overall structure
rg -n "BMK_benchMemAdvancedNoAlloc" programs/benchzstd.c -A 5Length of output: 410
Decode-only bench has confirmed heap overflow from misaligned buffer advancement
The issue is real and critical. In BMK_benchMemAdvancedNoAlloc():
resultBufferis allocated to hold the total decompressed size of all files.- In decode-only mode, each file gets exactly 1 chunk (
nbChunksforThisFile = 1). - For each chunk, you set
resSizes[chunkID]to that file's decompressed size but advanceresPtrby the file's compressed size.
When multiple files exist where decompressedSize > compressedSize, the buffer slices overlap:
Example: File1 (100B compressed → 300B decompressed), File2 (100B compressed → 200B decompressed)
- Total decompressed = 500B (buffer size)
- Layout should be: File1 [0–299], File2 [300–499]
- Actual result: File1 [0–299], resPtr advances by 100 → File2 starts at [100–299] ✗
When local_defaultDecompress() decompresses File2 at address resPtrs[1] with capacity resSizes[1], it writes past the intended boundary into File1's region, causing heap corruption.
The fix is to advance resPtr by decompressed size in decode-only mode:
- resSizes[chunkID] = (adv->mode == BMK_decodeOnly)
- ? (size_t)ZSTD_findDecompressedSize(
- srcPtr, chunkSize)
- : chunkSize;
- srcPtr += chunkSize;
- cPtr += cCapacities[chunkID];
- resPtr += chunkSize;
+ if (adv->mode == BMK_decodeOnly) {
+ size_t const decSize = (size_t)ZSTD_findDecompressedSize(srcPtr, chunkSize);
+ resSizes[chunkID] = decSize;
+ resPtr += decSize;
+ } else {
+ resSizes[chunkID] = chunkSize;
+ resPtr += chunkSize;
+ }
+ srcPtr += chunkSize;
+ cPtr += cCapacities[chunkID];📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (adv->mode == BMK_decodeOnly) { | |
| /* benchmark only decompression : source must be already compressed */ | |
| const char* srcPtr = (const char*)srcBuffer; | |
| U64 totalDSize64 = 0; | |
| U32 fileNb; | |
| for (fileNb = 0; fileNb < nbFiles; fileNb++) { | |
| U64 const fSize64 = | |
| ZSTD_findDecompressedSize(srcPtr, fileSizes[fileNb]); | |
| if (fSize64 == ZSTD_CONTENTSIZE_UNKNOWN) { | |
| RETURN_ERROR( | |
| 32, | |
| BMK_benchOutcome_t, | |
| "Decompressed size cannot be determined: cannot benchmark"); | |
| } | |
| if (fSize64 == ZSTD_CONTENTSIZE_ERROR) { | |
| RETURN_ERROR( | |
| 32, | |
| BMK_benchOutcome_t, | |
| "Error while trying to assess decompressed size: data may be invalid"); | |
| } | |
| totalDSize64 += fSize64; | |
| srcPtr += fileSizes[fileNb]; | |
| } | |
| { | |
| size_t const decodedSize = (size_t)totalDSize64; | |
| assert((U64)decodedSize == totalDSize64); /* check overflow */ | |
| free(*resultBufferPtr); | |
| if (totalDSize64 > decodedSize) { /* size_t overflow */ | |
| RETURN_ERROR( | |
| 32, | |
| BMK_benchOutcome_t, | |
| "decompressed size is too large for local system"); | |
| } | |
| *resultBufferPtr = malloc(decodedSize); | |
| if (!(*resultBufferPtr)) { | |
| RETURN_ERROR( | |
| 33, | |
| BMK_benchOutcome_t, | |
| "allocation error: not enough memory"); | |
| } | |
| cSize = srcSize; | |
| srcSize = decodedSize; | |
| ratio = (double)srcSize / (double)cSize; | |
| } | |
| } | |
| /* Init data chunks */ | |
| { | |
| const char* srcPtr = (const char*)srcBuffer; | |
| char* cPtr = (char*)compressedBuffer; | |
| char* resPtr = (char*)(*resultBufferPtr); | |
| U32 fileNb, chunkID; | |
| for (chunkID = 0, fileNb = 0; fileNb < nbFiles; fileNb++) { | |
| size_t remaining = fileSizes[fileNb]; | |
| U32 const nbChunksforThisFile = (adv->mode == BMK_decodeOnly) | |
| ? 1 | |
| : (U32)((remaining + (chunkSizeMax - 1)) / chunkSizeMax); | |
| U32 const chunkIdEnd = chunkID + nbChunksforThisFile; | |
| for (; chunkID < chunkIdEnd; chunkID++) { | |
| size_t const chunkSize = MIN(remaining, chunkSizeMax); | |
| srcPtrs[chunkID] = srcPtr; | |
| srcSizes[chunkID] = chunkSize; | |
| cPtrs[chunkID] = cPtr; | |
| cCapacities[chunkID] = (adv->mode == BMK_decodeOnly) | |
| ? chunkSize | |
| : ZSTD_compressBound(chunkSize); | |
| resPtrs[chunkID] = resPtr; | |
| resSizes[chunkID] = (adv->mode == BMK_decodeOnly) | |
| ? (size_t)ZSTD_findDecompressedSize( | |
| srcPtr, chunkSize) | |
| : chunkSize; | |
| srcPtr += chunkSize; | |
| cPtr += cCapacities[chunkID]; | |
| resPtr += chunkSize; | |
| remaining -= chunkSize; | |
| if (adv->mode == BMK_decodeOnly) { | |
| cSizes[chunkID] = chunkSize; | |
| benchResult.cSize = chunkSize; | |
| } | |
| } | |
| } | |
| nbChunks = chunkID; | |
| } | |
| if (adv->mode == BMK_decodeOnly) { | |
| /* benchmark only decompression : source must be already compressed */ | |
| const char* srcPtr = (const char*)srcBuffer; | |
| U64 totalDSize64 = 0; | |
| U32 fileNb; | |
| for (fileNb = 0; fileNb < nbFiles; fileNb++) { | |
| U64 const fSize64 = | |
| ZSTD_findDecompressedSize(srcPtr, fileSizes[fileNb]); | |
| if (fSize64 == ZSTD_CONTENTSIZE_UNKNOWN) { | |
| RETURN_ERROR( | |
| 32, | |
| BMK_benchOutcome_t, | |
| "Decompressed size cannot be determined: cannot benchmark"); | |
| } | |
| if (fSize64 == ZSTD_CONTENTSIZE_ERROR) { | |
| RETURN_ERROR( | |
| 32, | |
| BMK_benchOutcome_t, | |
| "Error while trying to assess decompressed size: data may be invalid"); | |
| } | |
| totalDSize64 += fSize64; | |
| srcPtr += fileSizes[fileNb]; | |
| } | |
| { | |
| size_t const decodedSize = (size_t)totalDSize64; | |
| assert((U64)decodedSize == totalDSize64); /* check overflow */ | |
| free(*resultBufferPtr); | |
| if (totalDSize64 > decodedSize) { /* size_t overflow */ | |
| RETURN_ERROR( | |
| 32, | |
| BMK_benchOutcome_t, | |
| "decompressed size is too large for local system"); | |
| } | |
| *resultBufferPtr = malloc(decodedSize); | |
| if (!(*resultBufferPtr)) { | |
| RETURN_ERROR( | |
| 33, | |
| BMK_benchOutcome_t, | |
| "allocation error: not enough memory"); | |
| } | |
| cSize = srcSize; | |
| srcSize = decodedSize; | |
| ratio = (double)srcSize / (double)cSize; | |
| } | |
| } | |
| /* Init data chunks */ | |
| { | |
| const char* srcPtr = (const char*)srcBuffer; | |
| char* cPtr = (char*)compressedBuffer; | |
| char* resPtr = (char*)(*resultBufferPtr); | |
| U32 fileNb, chunkID; | |
| for (chunkID = 0, fileNb = 0; fileNb < nbFiles; fileNb++) { | |
| size_t remaining = fileSizes[fileNb]; | |
| U32 const nbChunksforThisFile = (adv->mode == BMK_decodeOnly) | |
| ? 1 | |
| : (U32)((remaining + (chunkSizeMax - 1)) / chunkSizeMax); | |
| U32 const chunkIdEnd = chunkID + nbChunksforThisFile; | |
| for (; chunkID < chunkIdEnd; chunkID++) { | |
| size_t const chunkSize = MIN(remaining, chunkSizeMax); | |
| srcPtrs[chunkID] = srcPtr; | |
| srcSizes[chunkID] = chunkSize; | |
| cPtrs[chunkID] = cPtr; | |
| cCapacities[chunkID] = (adv->mode == BMK_decodeOnly) | |
| ? chunkSize | |
| : ZSTD_compressBound(chunkSize); | |
| resPtrs[chunkID] = resPtr; | |
| if (adv->mode == BMK_decodeOnly) { | |
| size_t const decSize = (size_t)ZSTD_findDecompressedSize(srcPtr, chunkSize); | |
| resSizes[chunkID] = decSize; | |
| resPtr += decSize; | |
| } else { | |
| resSizes[chunkID] = chunkSize; | |
| resPtr += chunkSize; | |
| } | |
| srcPtr += chunkSize; | |
| cPtr += cCapacities[chunkID]; | |
| remaining -= chunkSize; | |
| if (adv->mode == BMK_decodeOnly) { | |
| cSizes[chunkID] = chunkSize; | |
| benchResult.cSize = chunkSize; | |
| } | |
| } | |
| } | |
| nbChunks = chunkID; | |
| } |
🤖 Prompt for AI Agents
In programs/benchzstd.c around lines 447 to 529, resPtr is advanced by the
compressed size (chunkSize) in decode-only mode, causing overlapping result
buffers when decompressed sizes > compressed sizes; change the advancement so
resPtr moves by the decompressed size for decode-only chunks (i.e. use
resSizes[chunkID] or the computed decompressed size) instead of chunkSize when
adv->mode == BMK_decodeOnly, keeping the existing logic for other modes and
ensuring resSizes is computed before advancing resPtr.
| /* avoid int overflow */ | ||
| if (storedSkips > 1 GB) { | ||
| if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0) | ||
| EXM_THROW(91, "1 GB skip error (sparse file support)"); | ||
| storedSkips -= 1 GB; | ||
| } | ||
|
|
||
| while (ptrT < bufferTEnd) { | ||
| size_t nb0T; | ||
|
|
||
| /* adjust last segment if < 32 KB */ | ||
| size_t seg0SizeT = segmentSizeT; | ||
| if (seg0SizeT > bufferSizeT) seg0SizeT = bufferSizeT; | ||
| bufferSizeT -= seg0SizeT; | ||
|
|
||
| /* count leading zeroes */ | ||
| for (nb0T=0; (nb0T < seg0SizeT) && (ptrT[nb0T] == 0); nb0T++) ; | ||
| storedSkips += (unsigned)(nb0T * sizeof(size_t)); | ||
|
|
||
| if (nb0T != seg0SizeT) { /* not all 0s */ | ||
| size_t const nbNon0ST = seg0SizeT - nb0T; | ||
| /* skip leading zeros */ | ||
| if (LONG_SEEK(file, storedSkips, SEEK_CUR) != 0) | ||
| EXM_THROW(92, "Sparse skip error ; try --no-sparse"); | ||
| storedSkips = 0; | ||
| /* write the rest */ | ||
| if (fwrite(ptrT + nb0T, sizeof(size_t), nbNon0ST, file) != nbNon0ST) | ||
| EXM_THROW(93, "Write error : cannot write block : %s", | ||
| strerror(errno)); | ||
| } | ||
| ptrT += seg0SizeT; | ||
| } | ||
|
|
||
| { static size_t const maskT = sizeof(size_t)-1; | ||
| if (bufferSize & maskT) { | ||
| /* size not multiple of sizeof(size_t) : implies end of block */ | ||
| const char* const restStart = (const char*)bufferTEnd; | ||
| const char* restPtr = restStart; | ||
| const char* const restEnd = (const char*)buffer + bufferSize; | ||
| assert(restEnd > restStart && restEnd < restStart + sizeof(size_t)); | ||
| for ( ; (restPtr < restEnd) && (*restPtr == 0); restPtr++) ; | ||
| storedSkips += (unsigned) (restPtr - restStart); | ||
| if (restPtr != restEnd) { | ||
| /* not all remaining bytes are 0 */ | ||
| size_t const restSize = (size_t)(restEnd - restPtr); | ||
| if (LONG_SEEK(file, storedSkips, SEEK_CUR) != 0) | ||
| EXM_THROW(92, "Sparse skip error ; try --no-sparse"); | ||
| if (fwrite(restPtr, 1, restSize, file) != restSize) | ||
| EXM_THROW(95, "Write error : cannot write end of decoded block : %s", | ||
| strerror(errno)); | ||
| storedSkips = 0; | ||
| } } } |
There was a problem hiding this comment.
Prevent sparse skip accumulator overflow
Line 55 only trims a single 1 GB chunk, so one buffer containing ≥4 GB of zeros drives the unsigned storedSkips past UINT32_MAX before we ever hit Line 79 or Line 95. The counter wraps, the subsequent seek/write targets the wrong offset, and the sparse file is silently corrupted. Please flush in 1 GB chunks every time we cross the threshold—both on entry and after each increment—so storedSkips never exceeds 1 GB.
- if (storedSkips > 1 GB) {
- if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0)
- EXM_THROW(91, "1 GB skip error (sparse file support)");
- storedSkips -= 1 GB;
- }
+ while (storedSkips > 1 GB) {
+ if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0)
+ EXM_THROW(91, "1 GB skip error (sparse file support)");
+ storedSkips -= 1 GB;
+ }
@@
- storedSkips += (unsigned)(nb0T * sizeof(size_t));
+ storedSkips += (unsigned)(nb0T * sizeof(size_t));
+ while (storedSkips > 1 GB) {
+ if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0)
+ EXM_THROW(91, "1 GB skip error (sparse file support)");
+ storedSkips -= 1 GB;
+ }
@@
- storedSkips += (unsigned) (restPtr - restStart);
+ storedSkips += (unsigned) (restPtr - restStart);
+ while (storedSkips > 1 GB) {
+ if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0)
+ EXM_THROW(91, "1 GB skip error (sparse file support)");
+ storedSkips -= 1 GB;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* avoid int overflow */ | |
| if (storedSkips > 1 GB) { | |
| if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0) | |
| EXM_THROW(91, "1 GB skip error (sparse file support)"); | |
| storedSkips -= 1 GB; | |
| } | |
| while (ptrT < bufferTEnd) { | |
| size_t nb0T; | |
| /* adjust last segment if < 32 KB */ | |
| size_t seg0SizeT = segmentSizeT; | |
| if (seg0SizeT > bufferSizeT) seg0SizeT = bufferSizeT; | |
| bufferSizeT -= seg0SizeT; | |
| /* count leading zeroes */ | |
| for (nb0T=0; (nb0T < seg0SizeT) && (ptrT[nb0T] == 0); nb0T++) ; | |
| storedSkips += (unsigned)(nb0T * sizeof(size_t)); | |
| if (nb0T != seg0SizeT) { /* not all 0s */ | |
| size_t const nbNon0ST = seg0SizeT - nb0T; | |
| /* skip leading zeros */ | |
| if (LONG_SEEK(file, storedSkips, SEEK_CUR) != 0) | |
| EXM_THROW(92, "Sparse skip error ; try --no-sparse"); | |
| storedSkips = 0; | |
| /* write the rest */ | |
| if (fwrite(ptrT + nb0T, sizeof(size_t), nbNon0ST, file) != nbNon0ST) | |
| EXM_THROW(93, "Write error : cannot write block : %s", | |
| strerror(errno)); | |
| } | |
| ptrT += seg0SizeT; | |
| } | |
| { static size_t const maskT = sizeof(size_t)-1; | |
| if (bufferSize & maskT) { | |
| /* size not multiple of sizeof(size_t) : implies end of block */ | |
| const char* const restStart = (const char*)bufferTEnd; | |
| const char* restPtr = restStart; | |
| const char* const restEnd = (const char*)buffer + bufferSize; | |
| assert(restEnd > restStart && restEnd < restStart + sizeof(size_t)); | |
| for ( ; (restPtr < restEnd) && (*restPtr == 0); restPtr++) ; | |
| storedSkips += (unsigned) (restPtr - restStart); | |
| if (restPtr != restEnd) { | |
| /* not all remaining bytes are 0 */ | |
| size_t const restSize = (size_t)(restEnd - restPtr); | |
| if (LONG_SEEK(file, storedSkips, SEEK_CUR) != 0) | |
| EXM_THROW(92, "Sparse skip error ; try --no-sparse"); | |
| if (fwrite(restPtr, 1, restSize, file) != restSize) | |
| EXM_THROW(95, "Write error : cannot write end of decoded block : %s", | |
| strerror(errno)); | |
| storedSkips = 0; | |
| } } } | |
| /* avoid int overflow */ | |
| while (storedSkips > 1 GB) { | |
| if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0) | |
| EXM_THROW(91, "1 GB skip error (sparse file support)"); | |
| storedSkips -= 1 GB; | |
| } | |
| while (ptrT < bufferTEnd) { | |
| size_t nb0T; | |
| /* adjust last segment if < 32 KB */ | |
| size_t seg0SizeT = segmentSizeT; | |
| if (seg0SizeT > bufferSizeT) seg0SizeT = bufferSizeT; | |
| bufferSizeT -= seg0SizeT; | |
| /* count leading zeroes */ | |
| for (nb0T=0; (nb0T < seg0SizeT) && (ptrT[nb0T] == 0); nb0T++) ; | |
| storedSkips += (unsigned)(nb0T * sizeof(size_t)); | |
| while (storedSkips > 1 GB) { | |
| if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0) | |
| EXM_THROW(91, "1 GB skip error (sparse file support)"); | |
| storedSkips -= 1 GB; | |
| } | |
| if (nb0T != seg0SizeT) { /* not all 0s */ | |
| size_t const nbNon0ST = seg0SizeT - nb0T; | |
| /* skip leading zeros */ | |
| if (LONG_SEEK(file, storedSkips, SEEK_CUR) != 0) | |
| EXM_THROW(92, "Sparse skip error ; try --no-sparse"); | |
| storedSkips = 0; | |
| /* write the rest */ | |
| if (fwrite(ptrT + nb0T, sizeof(size_t), nbNon0ST, file) != nbNon0ST) | |
| EXM_THROW(93, "Write error : cannot write block : %s", | |
| strerror(errno)); | |
| } | |
| ptrT += seg0SizeT; | |
| } | |
| { static size_t const maskT = sizeof(size_t)-1; | |
| if (bufferSize & maskT) { | |
| /* size not multiple of sizeof(size_t) : implies end of block */ | |
| const char* const restStart = (const char*)bufferTEnd; | |
| const char* restPtr = restStart; | |
| const char* const restEnd = (const char*)buffer + bufferSize; | |
| assert(restEnd > restStart && restEnd < restStart + sizeof(size_t)); | |
| for ( ; (restPtr < restEnd) && (*restPtr == 0); restPtr++) ; | |
| storedSkips += (unsigned) (restPtr - restStart); | |
| while (storedSkips > 1 GB) { | |
| if (LONG_SEEK(file, 1 GB, SEEK_CUR) != 0) | |
| EXM_THROW(91, "1 GB skip error (sparse file support)"); | |
| storedSkips -= 1 GB; | |
| } | |
| if (restPtr != restEnd) { | |
| /* not all remaining bytes are 0 */ | |
| size_t const restSize = (size_t)(restEnd - restPtr); | |
| if (LONG_SEEK(file, storedSkips, SEEK_CUR) != 0) | |
| EXM_THROW(92, "Sparse skip error ; try --no-sparse"); | |
| if (fwrite(restPtr, 1, restSize, file) != restSize) | |
| EXM_THROW(95, "Write error : cannot write end of decoded block : %s", | |
| strerror(errno)); | |
| storedSkips = 0; | |
| } } } |
| #ifdef _WIN32 | ||
| static int UTIL_prepareFileList(const char* dirName, | ||
| char** bufStart, size_t* pos, | ||
| char** bufEnd, int followLinks) | ||
| { | ||
| char* path; | ||
| size_t dirLength, pathLength; | ||
| int nbFiles = 0; | ||
| WIN32_FIND_DATAA cFile; | ||
| HANDLE hFile; | ||
|
|
||
| dirLength = strlen(dirName); | ||
| path = (char*) malloc(dirLength + 3); | ||
| if (!path) return 0; | ||
|
|
||
| memcpy(path, dirName, dirLength); | ||
| path[dirLength] = '\\'; | ||
| path[dirLength+1] = '*'; | ||
| path[dirLength+2] = 0; | ||
|
|
||
| hFile=FindFirstFileA(path, &cFile); | ||
| if (hFile == INVALID_HANDLE_VALUE) { | ||
| UTIL_DISPLAYLEVEL(1, "Cannot open directory '%s'\n", dirName); | ||
| return 0; | ||
| } | ||
| free(path); |
There was a problem hiding this comment.
Free path on FindFirstFileA failure in UTIL_prepareFileList (Windows leak).
When FindFirstFileA() fails, the function returns 0 without freeing path, leaking a small heap buffer on the error path.
Suggested fix:
- hFile=FindFirstFileA(path, &cFile);
- if (hFile == INVALID_HANDLE_VALUE) {
- UTIL_DISPLAYLEVEL(1, "Cannot open directory '%s'\n", dirName);
- return 0;
- }
- free(path);
+ hFile = FindFirstFileA(path, &cFile);
+ if (hFile == INVALID_HANDLE_VALUE) {
+ UTIL_DISPLAYLEVEL(1, "Cannot open directory '%s'\n", dirName);
+ free(path);
+ return 0;
+ }
+ free(path);🤖 Prompt for AI Agents
In programs/util.c around lines 908 to 933, the code allocates 'path' then calls
FindFirstFileA and returns 0 on failure without freeing 'path', leaking memory;
modify the failure branch so you free(path) before returning (i.e., call
free(path) immediately after detecting hFile == INVALID_HANDLE_VALUE, then log
and return 0) to ensure the allocated buffer is released on the error path.
| static void* ZWRAP_customCalloc(size_t size, ZSTD_customMem customMem) | ||
| { | ||
| if (customMem.customAlloc) { | ||
| /* calloc implemented as malloc+memset; | ||
| * not as efficient as calloc, but next best guess for custom malloc */ | ||
| void* const ptr = customMem.customAlloc(customMem.opaque, size); | ||
| memset(ptr, 0, size); | ||
| return ptr; | ||
| } | ||
| return calloc(1, size); | ||
| } |
There was a problem hiding this comment.
Guard against NULL from custom allocator in ZWRAP_customCalloc.
If customMem.customAlloc returns NULL (e.g., OOM), memset(ptr, 0, size) dereferences NULL and triggers undefined behavior instead of propagating a clean failure.
Suggested fix:
static void* ZWRAP_customCalloc(size_t size, ZSTD_customMem customMem)
{
if (customMem.customAlloc) {
/* calloc implemented as malloc+memset;
* not as efficient as calloc, but next best guess for custom malloc */
- void* const ptr = customMem.customAlloc(customMem.opaque, size);
- memset(ptr, 0, size);
- return ptr;
+ void* const ptr = customMem.customAlloc(customMem.opaque, size);
+ if (ptr == NULL) {
+ return NULL;
+ }
+ memset(ptr, 0, size);
+ return ptr;
}
return calloc(1, size);
}🤖 Prompt for AI Agents
In zlibWrapper/zstd_zlibwrapper.c around lines 128 to 138, the custom calloc
uses customMem.customAlloc but does not check its return value before calling
memset, which can dereference NULL; change the logic so that after calling
customMem.customAlloc you check if ptr is NULL and if so return NULL
immediately, otherwise call memset(ptr, 0, size) and return ptr; keep the
fallback to return calloc(1, size) when customAlloc is not provided.
| { int const result = (flush == Z_INFLATE_SYNC) ? | ||
| inflateSync(strm) : | ||
| inflate(strm, flush); | ||
| LOG_WRAPPERD("- inflate3 flush=%d avail_in=%d avail_out=%d total_in=%d total_out=%d res=%d\n", | ||
| (int)flush, (int)strm->avail_in, (int)strm->avail_out, (int)strm->total_in, (int)strm->total_out, res); | ||
| return result; | ||
| } } |
There was a problem hiding this comment.
Fix undeclared variable res in z_inflate logging (compilation error).
Both LOG_WRAPPERD calls use res which is not defined; the local variable is result. This will fail to compile.
Suggested fix:
- { int const result = (flush == Z_INFLATE_SYNC) ?
- inflateSync(strm) :
- inflate(strm, flush);
- LOG_WRAPPERD("- inflate3 flush=%d avail_in=%d avail_out=%d total_in=%d total_out=%d res=%d\n",
- (int)flush, (int)strm->avail_in, (int)strm->avail_out, (int)strm->total_in, (int)strm->total_out, res);
- return result;
- } }
+ { int const result = (flush == Z_INFLATE_SYNC) ?
+ inflateSync(strm) :
+ inflate(strm, flush);
+ LOG_WRAPPERD("- inflate3 flush=%d avail_in=%d avail_out=%d total_in=%d total_out=%d res=%d\n",
+ (int)flush, (int)strm->avail_in, (int)strm->avail_out,
+ (int)strm->total_in, (int)strm->total_out, result);
+ return result;
+ } }- { int const result = (flush == Z_INFLATE_SYNC) ?
- inflateSync(strm) :
- inflate(strm, flush);
- LOG_WRAPPERD("- inflate2 flush=%d avail_in=%d avail_out=%d total_in=%d total_out=%d res=%d\n",
- (int)flush, (int)strm->avail_in, (int)strm->avail_out, (int)strm->total_in, (int)strm->total_out, res);
- return result;
- } } } /* if ! (zwd->totalInBytes == 0 && strm->avail_in >= ZLIB_HEADERSIZE) */
+ { int const result = (flush == Z_INFLATE_SYNC) ?
+ inflateSync(strm) :
+ inflate(strm, flush);
+ LOG_WRAPPERD("- inflate2 flush=%d avail_in=%d avail_out=%d total_in=%d total_out=%d res=%d\n",
+ (int)flush, (int)strm->avail_in, (int)strm->avail_out,
+ (int)strm->total_in, (int)strm->total_out, result);
+ return result;
+ } } } /* if ! (zwd->totalInBytes == 0 && strm->avail_in >= ZLIB_HEADERSIZE) */Also applies to: 820-825
🤖 Prompt for AI Agents
In zlibWrapper/zstd_zlibwrapper.c around lines 769-775 (and also change the
analogous LOG_WRAPPERD at 820-825), the debug log calls reference an undeclared
variable `res` while the actual local variable is named `result`; update the
LOG_WRAPPERD invocations to pass `result` (with the same casts used for other
numeric args, e.g., (int)result) instead of `res` so the code compiles and logs
the correct return value.
Note
Imports legacy ZSTD v0.6/v0.7 code, adds I/O and utility modules (incl. async file I/O), benchmarking tools, a zlib wrapper using ZSTD, and tests for large dictionaries.
lib/legacy/zstd_v06.candlib/legacy/zstd_v07.cwith full v0.6/v0.7 compression/decompression implementations.zlibWrapper/(gzlib.c,gzread.c,zstd_zlibwrapper.c) to expose ZSTD via zlib-compatible APIs.programs/benchfn.c,benchzstd.c,datagen.c,dibio.c,fileio.c,fileio_asyncio.c,util.c.tests/bigdict.c,tests/largeDictionary.c.Written by Cursor Bugbot for commit 14d58f4. Configure here.
Summary by CodeRabbit
Release Notes
New Features
Benchmarking & Tools