Problem
compress-dict ratio for the Fast strategy (negative levels) trails C on small structured inputs.
Current numbers (i9, REPORT_DICT, small-4k-log-lines, input 4096 B, trained dict 437 B):
| level |
rust with-dict |
ffi with-dict |
gap |
| level_-7_fast |
61 |
55 |
1.11× |
| level_-5_fast |
61 |
48 |
1.27× |
| level_-1_fast |
61 |
48 |
1.27× |
| level_2_fast |
64 |
48 |
1.33× |
| level_8_lazy |
49 |
48 |
parity |
The original 2× gap (rust 91 vs ffi 44) has narrowed to ~1.1–1.3× through the dict-path work that landed since (prepared-dict reuse, HUF pre-build gate, entropy seeding fixes), but rust_bytes > ffi_bytes still holds across the whole fast band, so a real ratio defect remains. Lazy/greedy/BT are at parity.
Investigation history
Refuted (2026-06-02): separate immutable dict hash table + dual-probe (reference dict-match-state mirror) — implemented and measured, 0 fallback hits across all scenarios, zero ratio change. The dictionary is fully primed, fully retained, fully matchable; the live single-slot table already surfaces every dict candidate.
Confirmed root cause: our Fast kernel SELECTS worse matches than the reference given the same candidate set. Same general Fast match-find weakness as #220 (reference finds short-distance matches we skip), amplified by the dictionary.
Next steps
Estimate: 1d (sequence diff investigation), fix size unknown until diff.
Problem
compress-dictratio for the Fast strategy (negative levels) trails C on small structured inputs.Current numbers (i9, REPORT_DICT, small-4k-log-lines, input 4096 B, trained dict 437 B):
The original 2× gap (rust 91 vs ffi 44) has narrowed to ~1.1–1.3× through the dict-path work that landed since (prepared-dict reuse, HUF pre-build gate, entropy seeding fixes), but
rust_bytes > ffi_bytesstill holds across the whole fast band, so a real ratio defect remains. Lazy/greedy/BT are at parity.Investigation history
Refuted (2026-06-02): separate immutable dict hash table + dual-probe (reference dict-match-state mirror) — implemented and measured, 0 fallback hits across all scenarios, zero ratio change. The dictionary is fully primed, fully retained, fully matchable; the live single-slot table already surfaces every dict candidate.
Confirmed root cause: our Fast kernel SELECTS worse matches than the reference given the same candidate set. Same general Fast match-find weakness as #220 (reference finds short-distance matches we skip), amplified by the dictionary.
Next steps
rust 61 > ffi 48). Comparator is dict-aware viaSTRUCTURED_ZSTD_BENCH_DICT.Estimate: 1d (sequence diff investigation), fix size unknown until diff.