Skip to content

Fast-strategy dictionary matching ~2x weaker than C (compress-dict ratio gap) #323

Description

@polaz

Problem

compress-dict ratio for the Fast strategy (negative levels) trails C on small structured inputs.

Current numbers (i9, REPORT_DICT, small-4k-log-lines, input 4096 B, trained dict 437 B):

level rust with-dict ffi with-dict gap
level_-7_fast 61 55 1.11×
level_-5_fast 61 48 1.27×
level_-1_fast 61 48 1.27×
level_2_fast 64 48 1.33×
level_8_lazy 49 48 parity

The original 2× gap (rust 91 vs ffi 44) has narrowed to ~1.1–1.3× through the dict-path work that landed since (prepared-dict reuse, HUF pre-build gate, entropy seeding fixes), but rust_bytes > ffi_bytes still holds across the whole fast band, so a real ratio defect remains. Lazy/greedy/BT are at parity.

Investigation history

Refuted (2026-06-02): separate immutable dict hash table + dual-probe (reference dict-match-state mirror) — implemented and measured, 0 fallback hits across all scenarios, zero ratio change. The dictionary is fully primed, fully retained, fully matchable; the live single-slot table already surfaces every dict candidate.

Confirmed root cause: our Fast kernel SELECTS worse matches than the reference given the same candidate set. Same general Fast match-find weakness as #220 (reference finds short-distance matches we skip), amplified by the dictionary.

Next steps

Estimate: 1d (sequence diff investigation), fix size unknown until diff.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1-highHigh priority — core functionalityperformancePerformance optimization

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions