Skip to content

perf(decode + encode-greedy): close 3-5× donor gap on negative-level decompress; share SIMD primitives + add dedicated greedy strategy #178

Description

@polaz

Status (re-measured)

The original 3-5× negative-level decode gap is closed to 1.3-1.4×; the missing dedicated greedy strategy has landed. This issue now tracks the residual negative-level decode parity work only.

Current numbers (i9, decodecorpus-z000033, level_-1_fast):

Stream source Rust decode C decode Gap (was)
c_stream 907 µs (1.05 GiB/s) 647 µs (1.47 GiB/s) 1.40× (was 3.69×)
rust_stream 1.18 ms (824 MiB/s) 895 µs (1.06 GiB/s) 1.32× (was 5.35×)

Landed since the original report

  • Greedy strategy: dedicated StrategyTag::Greedy (L5) with lazy_depth = 0 on the Row finder — the reference's own greedy shape (its greedy/lazy share the row-search template with depth 0). The L4 dfast outlier was separately closed by the donor greedy double-fast port.
  • XXH64 frame checksum: gated on the frame's checksum flag (−61% on flag-off frames) and hashed per-block while cache-hot on the direct path; no longer a post-decode cold walk.
  • Per-kernel decoders: four ISA tiers (Scalar/BMI2/AVX2/VBMI2) with monolithic per-tier sequence loops, BMI2 bit-reader specialization, per-tier match-copy chains.
  • SIMD wildcopy / overshooting copies: donor-shape overshoot-tolerant copies with bounded tails per kernel.

Residual (the 1.3-1.4×)

The remaining gap is the known sequence-decode body delta vs the reference's single bmi2.constprop monolith (HUF 4-stream burst on literal-heavy frames being the biggest single item on weak-compression fixtures). Tracked levers, in ROI order:

  1. HUF burst decompress port (one inlined monolith vs our 3-fn split) — order-of-magnitude self-time delta on literal-heavy frames.
  2. Sequence-loop body instruction diff vs the reference per-tier.

Kill-switch criteria

Stop pulling individual levers when both stream sources sit within ~1.1× of the reference on the negative-level corpus, or when a lever returns <2% twice in a row (record the negative result and move on).

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1-highHigh priority — core functionalityenhancementNew feature or requestperformancePerformance optimization

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions