Skip to content

feat(decoding): #28 Phase 6.6 — legacy frame format decoders v0.5-v0.7 #131

Description

@polaz

Summary

Phase 6.6 of #28: complete the legacy frame decoder set with v0.5, v0.6, v0.7 support. These three formats are closer to the final v1.0 spec than v0.1-v0.4, so the dispatcher infra + shared bitstream/Huffman/FSE primitives from 6.5 (#65) carry over with relatively small additions per version.

After this issue lands, libzstd.so.1 decodes every published zstd frame format from 2014 (v0.1) through current v1.5.7 — matching upstream's full ZSTD_LEGACY_SUPPORT=1 configuration.

Format versions covered by this issue

Version Year Magic Notable differences from v1.0
v0.5 2015 0xFD2FB525 Block format converged toward v1.0 layout. Dictionary improvements. Skippable frames introduced.
v0.6 2016 0xFD2FB526 Frame Content Size encoding refined. Single-segment flag.
v0.7 2016 0xFD2FB527 Last format before v1.0 stabilization. Almost-identical block + sequence layouts; primary differences in frame header reserved bits and checksum placement.

By v0.5 the format had converged enough that v0.5→v1.0 decoder shares a substantial fraction of its bitstream / Huffman / FSE entropy reader implementation. Expectation: v0.5 is the largest of the three (~700 LoC); v0.6 and v0.7 are smaller deltas (~500-600 LoC each) on top of v0.5's primitives.

Deliverables

1. Extend decoding/legacy/

decoding/legacy/
├── mod.rs        — dispatcher updated to route v0.5/v0.6/v0.7 magic
├── v01..v04.rs   — from #65
├── v05.rs        — v0.5 decoder
├── v06.rs        — v0.6 decoder (delta on v05)
└── v07.rs        — v0.7 decoder (delta on v06)

Where possible, share primitives between v0.5/v0.6/v0.7 via legacy/shared/ (FSE table reader, Huffman tree decoder if the layout stabilized by v0.5, sequence section parser). Don't try to share with v0.1-v0.4 — those formats diverged enough that the shared layer would be a tangle of match version { ... } arms.

2. C FFI surface (no new symbols)

The dispatcher in 6.5 (#65) already exports ZSTD_isLegacy / ZSTD_decompressLegacy. This issue extends the dispatcher's per-version arm but doesn't add new FFI symbols.

3. Tests

  • decoding/legacy/tests/v05_corpus.rs — vendored v0.5 archive corpus.
  • decoding/legacy/tests/v06_corpus.rs — vendored v0.6 archive corpus.
  • decoding/legacy/tests/v07_corpus.rs — vendored v0.7 archive corpus.
  • Cross-version dispatch: a single test that feeds an archive of each version (concatenated frames) and asserts each segment decodes correctly.
  • cli/tests/legacy_dispatch.rs extended to cover v0.5/v0.6/v0.7.

4. Coverage of ZSTD_LEGACY_SUPPORT=N levels

Match upstream's ZSTD_LEGACY_SUPPORT macro values:

Cargo features:

  • legacy (default: on) — currently all versions.
  • After this issue: legacy-v1, legacy-v2, ..., legacy-v7 granular features, matching upstream's per-version exclusion. Default-on still bundles them all.

Out of scope

  • Legacy encoding — upstream doesn't expose legacy encoders. We don't either.
  • v0.8 / v1.0-beta — never published as stable, no archives exist in the wild.

Acceptance criteria

  • v0.5, v0.6, v0.7 decoders produce byte-exact output on vendored corpora (validated against upstream reference).
  • Per-version Cargo features (legacy-v1 ... legacy-v7) compose correctly: each feature flag includes/excludes the right module.
  • ZSTD_decompress dispatches correctly for all seven legacy magic numbers (combined with perf(bench): add rust/ffi delta benchmark artifacts #65 corpus).
  • CLI handles a single archive containing mixed-version frames concatenated together (per upstream's documented "any frame can follow any frame" rule).
  • CI matrix runs at least --features legacy (all on) and --no-default-features (all off).

Estimate

~10-12 working days (~1700 LoC including shared primitives + cross-version tests). Smaller than #65 because v0.5-v0.7 are closer to v1.0 and share more infra.

Blocked by

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3-lowLow priority — nice to haveenhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions