Conversation
|
You have run out of free Bugbot PR reviews for this billing cycle. This will reset on December 21. To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial. |
WalkthroughAdds comprehensive implementations of core ZSTD compression components: FSE and Huffman entropy encoding/decoding, multi-threaded compression pipeline, low-distance matching, block pre-splitting, workspace management, dictionary training (COVER/FASTCOVER), suffix sorting, and benchmarking utilities spanning library core, compression engine, dictionary builders, and tools. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
|
@cursor review |
| totalCSize += cBlockSize; | ||
| } | ||
| return totalCSize; | ||
| } |
| if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0; | ||
|
|
||
| return srcSize; | ||
| } |
There was a problem hiding this comment.
Bug: Buffer Bug Causes Silent Compression Errors
The compress function passes srcSize as the destination capacity to ZSTD_compress2 instead of the provided dstCapacity parameter, and ignores the return value completely. This causes incorrect buffer size checking and silently ignores compression errors.
| size_t nbSrcBlocks = 0; | ||
| for (size_t ssnb=0; ssnb < srcSlices.nbSlices; ssnb++) { | ||
| size_t pos = 0; | ||
| while (pos <= srcSlices.capacities[ssnb]) { |
There was a problem hiding this comment.
| unsigned const ms_per_round = RUN_TIME_DEFAULT_MS; | ||
| unsigned const total_time_ms = nbRounds * ms_per_round; | ||
|
|
||
| double *const speedPerRound = (double *)malloc(nbRounds * sizeof(double)); |
|
|
||
| size_t ssnb = 0; | ||
| for (size_t sliceNb=0; sliceNb < nbSlices; ) { | ||
| ssnb = (ssnb + 1) % srcSlices.nbSlices; |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
lib/compress/zstd_compress_superblock.c (2)
48-49: Consider defining named constants for header size bounds.The magic number
200and threshold values1 KBand16 KBreduce readability. Consider defining these as named constants with explanatory comments.
356-356: Consider documenting the fallback multiplier.The
nbSeq * 10fallback estimate uses a magic multiplier. A brief comment explaining why 10 is chosen as the worst-case bytes-per-sequence estimate would improve clarity.programs/benchfn.c (1)
192-193: Consider clarifying the initialization comment.The comment "hopefully large enough" for the initial
fastestRun.nanoSecPerRunvalue is vague. Consider something like "initialized to a very large value (2000 seconds) to ensure any actual measurement will be smaller".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
contrib/largeNbDicts/largeNbDicts.c(1 hunks)contrib/pzstd/Pzstd.cpp(1 hunks)lib/common/fse_decompress.c(1 hunks)lib/common/pool.c(1 hunks)lib/compress/fse_compress.c(1 hunks)lib/compress/huf_compress.c(1 hunks)lib/compress/zstd_compress_internal.h(1 hunks)lib/compress/zstd_compress_sequences.c(1 hunks)lib/compress/zstd_compress_superblock.c(1 hunks)lib/compress/zstd_cwksp.h(1 hunks)lib/compress/zstd_ldm.c(1 hunks)lib/compress/zstd_preSplit.c(1 hunks)lib/compress/zstdmt_compress.c(1 hunks)lib/dictBuilder/cover.c(1 hunks)lib/dictBuilder/divsufsort.c(1 hunks)lib/dictBuilder/fastcover.c(1 hunks)lib/dictBuilder/zdict.c(1 hunks)programs/benchfn.c(1 hunks)programs/benchzstd.c(1 hunks)programs/dibio.c(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (14)
programs/dibio.c (2)
programs/util.c (1)
UTIL_getFileSize(545-558)lib/dictBuilder/zdict.c (3)
ZDICT_trainFromBuffer_legacy(1088-1108)ZDICT_isError(98-98)ZDICT_getErrorName(100-100)
lib/dictBuilder/fastcover.c (4)
lib/compress/zstd_compress_internal.h (17)
ZSTD_hash6Ptr(915-915)ZSTD_hash8Ptr(925-925)U32(584-596)U32(601-613)U32(1050-1055)U32(1061-1064)U32(1097-1125)U32(1132-1146)U32(1395-1407)U32(1412-1423)size_t(649-659)size_t(661-670)size_t(677-683)size_t(854-873)size_t(879-892)size_t(900-900)size_t(901-901)lib/dictBuilder/cover.c (12)
COVER_sum(278-285)COVER_computeEpochs(732-747)COVER_dictSelectionError(1035-1037)COVER_selectDict(1047-1132)COVER_dictSelectionIsError(1039-1041)COVER_best_finish(980-1024)COVER_dictSelectionFree(1043-1045)COVER_warnOnSmallCorpus(716-730)COVER_best_init(922-931)COVER_best_destroy(950-960)COVER_best_start(966-973)COVER_best_wait(936-945)lib/dictBuilder/zdict.c (1)
ZDICT_finalizeDictionary(862-941)lib/common/pool.c (6)
POOL_create(111-113)POOL_create(326-328)POOL_free(180-189)POOL_free(339-342)POOL_add(286-296)POOL_add(354-357)
lib/compress/zstd_compress_sequences.c (2)
lib/compress/fse_compress.c (5)
FSE_optimalTableLog(371-374)FSE_normalizeCount(465-525)FSE_writeNCount(330-340)FSE_buildCTable_rle(528-548)FSE_buildCTable_wksp(68-214)lib/legacy/zstd_v03.c (3)
ZSTD_isError(2373-2373)ERR_isError(525-525)MEM_32bits(107-107)
contrib/largeNbDicts/largeNbDicts.c (3)
programs/util.c (6)
UTIL_getFileSize(545-558)UTIL_isDirectory(357-365)UTIL_getTotalFileSize(630-645)UTIL_createExpandedFNT(1379-1423)UTIL_assembleFileNamesTable(811-815)UTIL_freeFileNamesTable(817-823)lib/compress/zstd_compress.c (13)
ZSTD_freeCDict(5740-5751)ZSTD_createCDict_advanced2(5668-5716)ZSTD_createCCtx(97-100)ZSTD_compressCCtx(5495-5503)ZSTD_compress_usingCDict(5920-5927)ZSTD_CCtx_setParametersUsingCCtxParams(1186-1198)ZSTD_freeCCtx(181-192)ZSTD_CCtx_refCDict(1336-1344)ZSTD_compress2(6576-6605)ZSTD_sizeof_CDict(5550-5557)ZSTD_createCCtxParams(349-352)ZSTD_CCtxParams_init(366-372)ZSTD_CCtxParams_setParameter(778-1029)programs/benchfn.c (6)
BMK_createTimedFnState(160-166)BMK_benchTimedFn(211-256)BMK_isSuccessful_runOutcome(64-67)BMK_extract_runTime(71-75)BMK_isCompleted_TimedFn(200-203)BMK_freeTimedFnState(168-168)
lib/compress/zstd_ldm.c (3)
lib/compress/zstd_compress_internal.h (18)
BYTE(630-644)U32(584-596)U32(601-613)U32(1050-1055)U32(1061-1064)U32(1097-1125)U32(1132-1146)U32(1395-1407)U32(1412-1423)size_t(649-659)size_t(661-670)size_t(677-683)size_t(854-873)size_t(879-892)size_t(900-900)size_t(901-901)ZSTD_window_correctOverflow(1159-1230)ZSTD_storeSeq(775-811)lib/compress/zstd_cwksp.h (3)
BYTE(394-397)size_t(206-210)size_t(224-232)lib/compress/zstd_compress.c (1)
ZSTD_selectBlockCompressor(3077-3160)
lib/compress/zstdmt_compress.c (4)
lib/compress/zstd_compress_internal.h (9)
U32(584-596)U32(601-613)U32(1050-1055)U32(1061-1064)U32(1097-1125)U32(1132-1146)U32(1395-1407)U32(1412-1423)ZSTD_window_update(1354-1390)lib/compress/zstd_compress.c (9)
ZSTD_freeCCtx(181-192)ZSTD_createCCtx_advanced(114-124)ZSTD_referenceExternalSequences(4788-4797)ZSTD_compressBegin_advanced_internal(5293-5309)ZSTD_compressContinue_public(4861-4867)ZSTD_compressEnd_public(5415-5439)ZSTD_checkCParams(1396-1406)ZSTD_compressBound(70-74)ZSTD_writeLastEmptyBlock(4778-4786)lib/compress/zstd_ldm.c (3)
ZSTD_ldm_adjustParameters(135-167)ZSTD_ldm_getMaxNbSeq(179-182)ZSTD_ldm_generateSequences(526-601)lib/common/pool.c (10)
POOL_create_advanced(115-160)POOL_create_advanced(330-337)POOL_free(180-189)POOL_free(339-342)POOL_sizeof(206-211)POOL_sizeof(365-369)POOL_resize(244-253)POOL_resize(349-352)POOL_tryAdd(299-310)POOL_tryAdd(359-363)
programs/benchzstd.c (4)
lib/compress/zstd_compress.c (9)
ZSTD_CCtx_reset(1375-1390)ZSTD_CCtx_setParameter(716-776)ZSTD_CCtx_loadDictionary(1329-1333)ZSTD_compress2(6576-6605)ZSTD_compressBound(70-74)ZSTD_sizeof_CCtx(206-214)ZSTD_createCCtx(97-100)ZSTD_freeCCtx(181-192)ZSTD_maxCLevel(7681-7681)lib/legacy/zstd_v03.c (3)
ZSTD_isError(2373-2373)U64(126-129)U64(177-187)programs/benchfn.c (6)
BMK_benchTimedFn(211-256)BMK_isSuccessful_runOutcome(64-67)BMK_extract_runTime(71-75)BMK_isCompleted_TimedFn(200-203)BMK_createTimedFnState(160-166)BMK_freeTimedFnState(168-168)programs/util.c (3)
UTIL_getFileSize(545-558)UTIL_isDirectory(357-365)UTIL_getTotalFileSize(630-645)
lib/compress/huf_compress.c (2)
lib/legacy/zstd_v03.c (9)
HUF_isError(1451-1451)U32(121-124)U32(161-164)U32(166-175)BYTE(660-669)BYTE(671-680)HUF_readStats(1477-1553)MEM_32bits(107-107)ERR_isError(525-525)lib/compress/fse_compress.c (6)
FSE_optimalTableLog(371-374)FSE_normalizeCount(465-525)FSE_writeNCount(330-340)FSE_buildCTable_wksp(68-214)FSE_compress_usingCTable(610-620)FSE_optimalTableLog_internal(357-369)
lib/dictBuilder/cover.c (4)
lib/legacy/zstd_v03.c (5)
U64(126-129)U64(177-187)U32(121-124)U32(161-164)U32(166-175)lib/dictBuilder/zdict.c (2)
ZDICT_finalizeDictionary(862-941)ZDICT_isError(98-98)lib/compress/zstd_compress.c (6)
ZSTD_compressBound(70-74)ZSTD_createCCtx(97-100)ZSTD_createCDict(5718-5727)ZSTD_compress_usingCDict(5920-5927)ZSTD_freeCCtx(181-192)ZSTD_freeCDict(5740-5751)lib/common/pool.c (6)
POOL_create(111-113)POOL_create(326-328)POOL_free(180-189)POOL_free(339-342)POOL_add(286-296)POOL_add(354-357)
lib/compress/zstd_compress_internal.h (2)
lib/legacy/zstd_v03.c (10)
U32(121-124)U32(161-164)U32(166-175)ZSTD_isError(2373-2373)BYTE(660-669)BYTE(671-680)ZSTD_wildcopy(2331-2337)MEM_64bits(108-108)U64(126-129)U64(177-187)lib/compress/zstd_compress.c (9)
ZSTD_getSeqStore(222-222)ZSTD_seqToCodes(2701-2727)ZSTD_selectBlockCompressor(3077-3160)ZSTD_cParam_getBounds(427-645)ZSTD_loadCEntropy(5069-5158)ZSTD_reset_compressedBlockState(1926-1935)ZSTD_convertBlockSequences(7326-7400)ZSTD_get1BlockSummary(7404-7452)ZSTD_get1BlockSummary(7456-7481)
lib/compress/zstd_compress_superblock.c (4)
lib/compress/huf_compress.c (3)
HUF_compress1X_usingCTable(1163-1166)HUF_compress4X_usingCTable(1218-1221)HUF_estimateCompressedSize(794-803)lib/legacy/zstd_v03.c (3)
ERR_isError(525-525)BYTE(660-669)BYTE(671-680)lib/compress/zstd_compress_sequences.c (3)
ZSTD_encodeSequences(419-442)ZSTD_crossEntropyCost(139-154)ZSTD_fseBitCost(104-132)lib/compress/zstd_compress.c (1)
ZSTD_buildBlockEntropyStats(3806-3834)
lib/dictBuilder/zdict.c (5)
lib/legacy/zstd_v03.c (6)
ERR_isError(525-525)U32(121-124)U32(161-164)U32(166-175)HUF_isError(1451-1451)FSE_isError(1106-1106)lib/compress/zstd_compress.c (10)
ZSTD_reset_compressedBlockState(1926-1935)ZSTD_loadCEntropy(5069-5158)ZSTD_compressBegin_usingCDict_deprecated(5881-5885)ZSTD_compressBlock_deprecated(4891-4898)ZSTD_getSeqStore(222-222)ZSTD_seqToCodes(2701-2727)ZSTD_createCDict_advanced(5650-5666)ZSTD_createCCtx(97-100)ZSTD_freeCDict(5740-5751)ZSTD_freeCCtx(181-192)lib/dictBuilder/divsufsort.c (1)
divsufsort(1846-1873)lib/compress/huf_compress.c (2)
HUF_buildCTable_wksp(756-792)HUF_writeCTable_wksp(248-289)lib/compress/fse_compress.c (2)
FSE_normalizeCount(465-525)FSE_writeNCount(330-340)
lib/compress/fse_compress.c (3)
lib/legacy/zstd_v03.c (3)
FSE_isError(1106-1106)U16(116-119)U16(136-145)lib/legacy/zstd_v04.c (3)
FSE_isError(1089-1089)U16(92-95)U16(112-121)lib/legacy/zstd_v05.c (2)
U16(119-122)U16(149-157)
lib/common/fse_decompress.c (1)
lib/legacy/zstd_v04.c (4)
BYTE(839-848)BYTE(850-859)U16(92-95)U16(112-121)
🔇 Additional comments (17)
lib/compress/zstd_compress_superblock.c (3)
220-236: LGTM: Well-documented backward compatibility handling.The workaround for the zstd decoder bug (versions ≤ 1.3.4) is properly documented with a reference to the fix. The condition is appropriately guarded by
FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTIONto avoid issues during fuzzing.
665-688: LGTM: Clean public API with proper error handling.The function properly builds block entropy statistics and delegates to the multi-block compressor with appropriate error propagation.
649-657: Repcode regeneration logic is correct—no issues found.The
ZSTD_updateRepfunction correctly handles thell0parameter (whether litLength == 0). The function uses this boolean to adjust repcode calculations on line 825:repCode = OFFBASE_TO_REPCODE(offBase) - 1 + ll0. When litLength is 0, the repcode index shifts by 1, which is the correct ZSTD semantics. The superblock code at line 654 properly passes this condition and correctly regenerates repcode state for skipped sequences.programs/benchfn.c (2)
64-81: LGTM: Accessor functions enforce API contract.The functions correctly access internal fields marked with
never_ever_use_directly, which enforces that users must go through these accessor functions. The CONTROL macro provides appropriate assertions to catch misuse.
232-240: LGTM: Adaptive loop adjustment with overflow protection.The logic dynamically adjusts
nbLoopsbased on run duration to maintain consistent timing, with proper overflow checking. The 10x multiplier for short runs helps quickly reach the target duration.contrib/pzstd/Pzstd.cpp (3)
344-349: LGTM: Frame size calculation with documented constraints.The assertion ensures windowLog ≤ 23 to respect the 4GB skippable frame size limit, which is well-documented in the comment.
540-545: LGTM: Adaptive queue sizing for memory management.When a non-skippable frame is encountered, the code appropriately limits queue sizes to prevent excessive memory usage during serial decompression fallback.
599-610: LGTM: Skippable frame size validation.The code correctly validates that the compressed size fits in a 32-bit skippable frame before writing, with an appropriate assertion for the uint64_t cast.
lib/dictBuilder/fastcover.c (4)
149-219: LGTM: Well-structured segment selection algorithm.The sliding window approach with frequency tracking correctly implements the scoring function described in the header comments. The cleanup loops ensure state consistency between calls.
490-491: Related to frequency table size validation.This memcpy can copy up to 8GB if
f == 31. This reinforces the need to validatefbounds before context initialization (see previous comment on lines 377-382).
566-570: LGTM: Parameter validation before expensive operations.The code validates parameters (including
fbounds viaFASTCOVER_checkParameters) before allocating large frequency tables, which addresses the memory concerns raised in earlier comments.
377-382: Thefparameter is already validated before reaching the allocation.The parameter validation occurs in
FASTCOVER_checkParameters()at lines 241-243, which ensures0 < f <= FASTCOVER_MAX_F(whereFASTCOVER_MAX_F == 31). This validation is called at lines 566 and 726, beforeFASTCOVER_ctx_init()is invoked, protecting the subsequent allocation at line 377. No changes are needed.Likely an incorrect or invalid review comment.
lib/compress/zstd_ldm.c (5)
96-133: LGTM: Efficient rolling hash with batch processing.The gear hash implementation uses macros for loop unrolling and appropriate branch prediction hints. The batch processing pattern (filling splits array) is well-structured.
493-505: LGTM: Optimization for repeating patterns.The logic to detect and skip overlapping matches (e.g., repeated bytes) is a significant performance optimization. The hash reset and loop continuation correctly handle this edge case.
526-601: LGTM: Robust chunked sequence generation.The function properly handles large inputs through chunking, with appropriate overflow correction and distance constraint enforcement. The leftover literal accumulation correctly maintains state across chunks.
616-624: LGTM: Correct handling of shortened matches.When skipping makes a match too short (below minMatch), the code correctly adds the remaining match bytes as literals to the next sequence before advancing position.
697-704: LGTM: Strategy-appropriate LDM integration.For optimal parsing strategies (btopt and above), LDM sequences are correctly used as candidates rather than being directly accepted, allowing the optimal parser to make the final decision.
| static size_t compressBlocks(size_t* cSizes, /* optional (can be NULL). If present, must contain at least nbBlocks fields */ | ||
| slice_collection_t dstBlockBuffers, | ||
| slice_collection_t srcBlockBuffers, | ||
| ZSTD_CDict* cdict, int cLevel) | ||
| { | ||
| size_t const nbBlocks = srcBlockBuffers.nbSlices; | ||
| assert(dstBlockBuffers.nbSlices == srcBlockBuffers.nbSlices); | ||
|
|
||
| ZSTD_CCtx* const cctx = ZSTD_createCCtx(); | ||
| assert(cctx != NULL); | ||
|
|
||
| size_t totalCSize = 0; | ||
| for (size_t blockNb=0; blockNb < nbBlocks; blockNb++) { | ||
| size_t cBlockSize; | ||
| if (cdict == NULL) { | ||
| cBlockSize = ZSTD_compressCCtx(cctx, | ||
| dstBlockBuffers.slicePtrs[blockNb], dstBlockBuffers.capacities[blockNb], | ||
| srcBlockBuffers.slicePtrs[blockNb], srcBlockBuffers.capacities[blockNb], | ||
| cLevel); | ||
| } else { | ||
| cBlockSize = ZSTD_compress_usingCDict(cctx, | ||
| dstBlockBuffers.slicePtrs[blockNb], dstBlockBuffers.capacities[blockNb], | ||
| srcBlockBuffers.slicePtrs[blockNb], srcBlockBuffers.capacities[blockNb], | ||
| cdict); | ||
| } | ||
| CONTROL(!ZSTD_isError(cBlockSize)); | ||
| if (cSizes) cSizes[blockNb] = cBlockSize; | ||
| totalCSize += cBlockSize; | ||
| } | ||
| return totalCSize; | ||
| } |
There was a problem hiding this comment.
Free the temporary CCtx before returning.
compressBlocks() calls ZSTD_createCCtx() on every invocation but never releases it. In steady state this leaks one compression context per call, which adds up quickly when benchmarking large datasets. Please free the context before returning so the function stays memory-neutral.
static size_t compressBlocks(...)
{
ZSTD_CCtx* const cctx = ZSTD_createCCtx();
assert(cctx != NULL);
...
- return totalCSize;
+ ZSTD_freeCCtx(cctx);
+ return totalCSize;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static size_t compressBlocks(size_t* cSizes, /* optional (can be NULL). If present, must contain at least nbBlocks fields */ | |
| slice_collection_t dstBlockBuffers, | |
| slice_collection_t srcBlockBuffers, | |
| ZSTD_CDict* cdict, int cLevel) | |
| { | |
| size_t const nbBlocks = srcBlockBuffers.nbSlices; | |
| assert(dstBlockBuffers.nbSlices == srcBlockBuffers.nbSlices); | |
| ZSTD_CCtx* const cctx = ZSTD_createCCtx(); | |
| assert(cctx != NULL); | |
| size_t totalCSize = 0; | |
| for (size_t blockNb=0; blockNb < nbBlocks; blockNb++) { | |
| size_t cBlockSize; | |
| if (cdict == NULL) { | |
| cBlockSize = ZSTD_compressCCtx(cctx, | |
| dstBlockBuffers.slicePtrs[blockNb], dstBlockBuffers.capacities[blockNb], | |
| srcBlockBuffers.slicePtrs[blockNb], srcBlockBuffers.capacities[blockNb], | |
| cLevel); | |
| } else { | |
| cBlockSize = ZSTD_compress_usingCDict(cctx, | |
| dstBlockBuffers.slicePtrs[blockNb], dstBlockBuffers.capacities[blockNb], | |
| srcBlockBuffers.slicePtrs[blockNb], srcBlockBuffers.capacities[blockNb], | |
| cdict); | |
| } | |
| CONTROL(!ZSTD_isError(cBlockSize)); | |
| if (cSizes) cSizes[blockNb] = cBlockSize; | |
| totalCSize += cBlockSize; | |
| } | |
| return totalCSize; | |
| } | |
| static size_t compressBlocks(size_t* cSizes, /* optional (can be NULL). If present, must contain at least nbBlocks fields */ | |
| slice_collection_t dstBlockBuffers, | |
| slice_collection_t srcBlockBuffers, | |
| ZSTD_CDict* cdict, int cLevel) | |
| { | |
| size_t const nbBlocks = srcBlockBuffers.nbSlices; | |
| assert(dstBlockBuffers.nbSlices == srcBlockBuffers.nbSlices); | |
| ZSTD_CCtx* const cctx = ZSTD_createCCtx(); | |
| assert(cctx != NULL); | |
| size_t totalCSize = 0; | |
| for (size_t blockNb=0; blockNb < nbBlocks; blockNb++) { | |
| size_t cBlockSize; | |
| if (cdict == NULL) { | |
| cBlockSize = ZSTD_compressCCtx(cctx, | |
| dstBlockBuffers.slicePtrs[blockNb], dstBlockBuffers.capacities[blockNb], | |
| srcBlockBuffers.slicePtrs[blockNb], srcBlockBuffers.capacities[blockNb], | |
| cLevel); | |
| } else { | |
| cBlockSize = ZSTD_compress_usingCDict(cctx, | |
| dstBlockBuffers.slicePtrs[blockNb], dstBlockBuffers.capacities[blockNb], | |
| srcBlockBuffers.slicePtrs[blockNb], srcBlockBuffers.capacities[blockNb], | |
| cdict); | |
| } | |
| CONTROL(!ZSTD_isError(cBlockSize)); | |
| if (cSizes) cSizes[blockNb] = cBlockSize; | |
| totalCSize += cBlockSize; | |
| } | |
| ZSTD_freeCCtx(cctx); | |
| return totalCSize; | |
| } |
🤖 Prompt for AI Agents
In contrib/largeNbDicts/largeNbDicts.c around lines 538 to 568, the function
creates a ZSTD_CCtx with ZSTD_createCCtx() but never releases it; add a call to
ZSTD_freeCCtx(cctx) before returning to avoid leaking the compression context
(place it after the for loop and before the final return totalCSize). Also
ensure any future early-return or error paths free cctx as well (e.g., free cctx
before returning on detected errors).
| ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]); | ||
| ZSTD_compress2(ci->cctx, | ||
| dst, srcSize, | ||
| src, srcSize); | ||
|
|
||
| ci->dictNb = ci->dictNb + 1; | ||
| if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0; | ||
|
|
||
| return srcSize; | ||
| } |
There was a problem hiding this comment.
Propagate ZSTD_compress2() errors and use the provided capacity.
compress() currently hard-codes dstCapacity to srcSize when calling ZSTD_compress2() and ignores the return value. With incompressible input the function silently reports success while leaving the output buffer untouched, so later consumers misinterpret garbage. Please pass the actual dstCapacity and propagate any compression error before reporting success (while still returning srcSize for throughput accounting).
size_t compress(..., void* dst, size_t dstCapacity, void* payload)
{
compressInstructions* const ci = (compressInstructions*) payload;
(void)dstCapacity;
ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]);
- ZSTD_compress2(ci->cctx,
- dst, srcSize,
- src, srcSize);
+ {
+ size_t const cSize = ZSTD_compress2(ci->cctx,
+ dst, dstCapacity,
+ src, srcSize);
+ if (ZSTD_isError(cSize)) {
+ return cSize;
+ }
+ }
ci->dictNb = ci->dictNb + 1;
if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0;
return srcSize;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]); | |
| ZSTD_compress2(ci->cctx, | |
| dst, srcSize, | |
| src, srcSize); | |
| ci->dictNb = ci->dictNb + 1; | |
| if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0; | |
| return srcSize; | |
| } | |
| ZSTD_CCtx_refCDict(ci->cctx, ci->dictionaries.cdicts[ci->dictNb]); | |
| { | |
| size_t const cSize = ZSTD_compress2(ci->cctx, | |
| dst, dstCapacity, | |
| src, srcSize); | |
| if (ZSTD_isError(cSize)) { | |
| return cSize; | |
| } | |
| } | |
| ci->dictNb = ci->dictNb + 1; | |
| if (ci->dictNb >= ci->nbDicts) ci->dictNb = 0; | |
| return srcSize; | |
| } |
🤖 Prompt for AI Agents
In contrib/largeNbDicts/largeNbDicts.c around lines 627 to 636, ZSTD_compress2()
is being called with dstCapacity hard-coded to srcSize and its return value
ignored; change the call to pass the actual dstCapacity variable, capture the
returned size_t result, then check it with ZSTD_isError(result) and if it's an
error return that error code, otherwise return srcSize for throughput
accounting. Also ensure dstCapacity is in scope and used instead of srcSize in
the ZSTD_compress2() call.
| { | ||
| std::error_code ec; | ||
| if (is_directory(inputFile, ec)) { | ||
| errorHolder.setError("Output file is a directory -- ignored"); | ||
| return nullptr; | ||
| } |
There was a problem hiding this comment.
Fix incorrect error message.
The error message says "Output file is a directory" but the function is checking if the input file is a directory.
Apply this diff:
- errorHolder.setError("Output file is a directory -- ignored");
+ errorHolder.setError("Input file is a directory -- ignored");📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { | |
| std::error_code ec; | |
| if (is_directory(inputFile, ec)) { | |
| errorHolder.setError("Output file is a directory -- ignored"); | |
| return nullptr; | |
| } | |
| { | |
| std::error_code ec; | |
| if (is_directory(inputFile, ec)) { | |
| errorHolder.setError("Input file is a directory -- ignored"); | |
| return nullptr; | |
| } |
🤖 Prompt for AI Agents
In contrib/pzstd/Pzstd.cpp around lines 118 to 123, the error message
incorrectly refers to the "Output file" while the code is checking whether the
input file is a directory; change the error text to reference the input file
(e.g., "Input file is a directory -- ignored") so the message accurately
reflects the checked condition.
| #define SEGMENT_SIZE 512 | ||
| FPStats* const fpstats = (FPStats*)workspace; | ||
| Fingerprint* middleEvents = (Fingerprint*)(void*)((char*)workspace + 512 * sizeof(unsigned)); | ||
| assert(blockSize == (128 << 10)); | ||
| assert(workspace != NULL); | ||
| assert((size_t)workspace % ZSTD_ALIGNOF(FPStats) == 0); | ||
| ZSTD_STATIC_ASSERT(ZSTD_SLIPBLOCK_WORKSPACESIZE >= sizeof(FPStats)); | ||
| assert(wkspSize >= sizeof(FPStats)); (void)wkspSize; | ||
|
|
||
| initStats(fpstats); | ||
| HIST_add(fpstats->pastEvents.events, blockStart, SEGMENT_SIZE); | ||
| HIST_add(fpstats->newEvents.events, (const char*)blockStart + blockSize - SEGMENT_SIZE, SEGMENT_SIZE); | ||
| fpstats->pastEvents.nbEvents = fpstats->newEvents.nbEvents = SEGMENT_SIZE; | ||
| if (!compareFingerprints(&fpstats->pastEvents, &fpstats->newEvents, 0, 8)) | ||
| return blockSize; | ||
|
|
||
| HIST_add(middleEvents->events, (const char*)blockStart + blockSize/2 - SEGMENT_SIZE/2, SEGMENT_SIZE); | ||
| middleEvents->nbEvents = SEGMENT_SIZE; | ||
| { U64 const distFromBegin = fpDistance(&fpstats->pastEvents, middleEvents, 8); | ||
| U64 const distFromEnd = fpDistance(&fpstats->newEvents, middleEvents, 8); | ||
| U64 const minDistance = SEGMENT_SIZE * SEGMENT_SIZE / 3; | ||
| if (abs64((S64)distFromBegin - (S64)distFromEnd) < minDistance) | ||
| return 64 KB; | ||
| return (distFromBegin > distFromEnd) ? 32 KB : 96 KB; | ||
| } | ||
| } |
There was a problem hiding this comment.
Avoid aliasing middleEvents with the existing workspace.
middleEvents is derived from (char*)workspace + 512*sizeof(unsigned), which lands inside fpstats->pastEvents.events. Filling middleEvents therefore corrupts pastEvents and the comparison logic that follows. Allocate an independent buffer (e.g. a stack Fingerprint middleEvents; ZSTD_memset(&middleEvents, 0, sizeof(middleEvents));) or carve space after FPStats once the workspace size guarantee includes it. That keeps all three fingerprints disjoint so compareFingerprints() works as intended.
| while (!testmem) { | ||
| testmem = malloc((size_t)requiredMem); | ||
| requiredMem -= step; | ||
| } | ||
|
|
||
| free(testmem); | ||
| return (size_t)requiredMem; |
There was a problem hiding this comment.
Fix DiB_findMaxMem return value before dividing by memMult
When the first malloc() succeeds, we immediately subtract step, so the function returns one step too small. For small training sets (e.g. < ~0.75 MB) the subsequent division by memMult drops to zero, loadedSize becomes 0, and we load no samples, hitting the “nb of samples too low” exit path. Returning the actual allocated size fixes the regression.
- free(testmem);
- return (size_t)requiredMem;
+ free(testmem);
+ return (size_t)(requiredMem + step);🤖 Prompt for AI Agents
In programs/dibio.c around lines 223 to 229, the loop subtracts step immediately
after a successful malloc so the function returns one step too small; change the
logic so the returned size reflects the actual allocated memory (either subtract
step only when malloc fails, or add step back to requiredMem before returning)
so DiB_findMaxMem returns the true allocated size before the later division by
memMult.
Summary by CodeRabbit
Note
Cursor Bugbot is generating a summary for commit 4baa257. Configure here.