Skip to content

feat(decoding): #28 Phase 6.5 — legacy frame format decoders v0.1-v0.4 #130

Description

@polaz

Summary

Phase 6.5 of #28: implement legacy frame format decoders for zstd versions v0.1, v0.2, v0.3, v0.4 so our libzstd.so.1 can read archives produced by any of those format versions. Upstream's lib/legacy/ directory ships dedicated decoders per version; Alpine's zstd package includes them by default, and removing them breaks any consumer that holds onto pre-v1.0 archives.

Sibling 6.6 (#66) covers v0.5/v0.6/v0.7 — split for issue sizing (each version is its own format spec with its own block layout / entropy tables / dictionary mode).

Why legacy decoders matter for drop-in

  • Alpine zstd-1.5.7-r2 builds with ZSTD_LEGACY_SUPPORT=4 (decode v0.4+) or higher; any binary consumer linking libzstd.so.1 and feeding it a v0.1-v0.7 archive expects it to work.
  • Without legacy decoders, ZSTD_isFrame / ZSTD_getFrameContentSize return wrong values on legacy magic numbers, and ZSTD_decompress errors out instead of dispatching to the right decoder.
  • Reverse-dep impact: ZFS-on-Linux, btrfs userspace, some package managers, and historical archives.

Format versions covered by this issue

Version Year Magic Notable differences from v1.0
v0.1 2014 0xFD2FB51E Initial format. Fixed block size. No FSE — Huffman-only literals.
v0.2 2015 0xFD2FB522 FSE introduced (sequence-coding). Block size up to 128 KiB.
v0.3 2015 0xFD2FB523 Improved entropy stage. Frame header refactor.
v0.4 2015 0xFD2FB524 Dictionary IDs introduced. Multi-checksum field.

Each version has its own:

  • Frame header layout (different field sizes, different reserved bits)
  • Block header semantics (the Block_Maximum_Size evolved)
  • Literal section encoder (Huffman variants, raw/RLE/repeat modes added incrementally)
  • Sequence section encoder (FSE table layout differs)
  • Repeat-offset state machine

Donor parity reference: lib/legacy/zstd_v01.{c,h} ... zstd_v04.{c,h} in upstream zstd v1.5.7.

Deliverables

1. New crate module decoding/legacy/

decoding/
└── legacy/
    ├── mod.rs           — dispatcher: magic → version → decoder
    ├── v01.rs           — v0.1 decoder (~500 LoC)
    ├── v02.rs           — v0.2 decoder (~550 LoC)
    ├── v03.rs           — v0.3 decoder (~500 LoC)
    └── v04.rs           — v0.4 decoder (~550 LoC)

2. Magic-number dispatcher

decoding/frame_decoder.rs peek_frame_magic detects v0.1-v0.7 magic numbers and routes to the right decoder. Existing v1.0+ path unchanged when magic is current (0xFD2FB528).

3. C FFI surface

  • ZSTD_isFrame(buf, len) recognizes legacy magic.
  • ZSTD_getFrameContentSize(buf, len) returns the value the legacy header carries (or ZSTD_CONTENTSIZE_UNKNOWN for versions that don't store it).
  • ZSTD_decompress(...) auto-dispatches to legacy decoder via the magic check.
  • ZSTD_decompressDCtx(...) and ZSTD_decompressStream likewise.
  • New (from upstream): ZSTD_isLegacy(buf, len), ZSTD_getDecompressedSize_legacy(...), ZSTD_decompressLegacy(...) — exported so consumers that opt in via the legacy macros still link.

4. Tests

  • decoding/legacy/tests/v01_corpus.rs — decode a corpus of v0.1 frames produced by upstream zstd_v01_compress (vendored test fixtures). Assert byte-exact output vs upstream's reference decompression.
  • Same for v0.2, v0.3, v0.4 (one fixture file per version).
  • cli/tests/legacy_dispatch.rszstd -d FILE.zst where FILE is a v0.3 archive succeeds and produces the expected output.
  • ABI test: ZSTD_isLegacy symbol exported in c-api/tests/symbols.rs.

5. Build-time gate

Match upstream's ZSTD_LEGACY_SUPPORT macro:

  • Cargo feature legacy (default: enabled) wraps the entire decoding/legacy/ module.
  • When disabled, magic-number dispatcher returns a "legacy support not compiled in" error matching upstream's ZSTD_LEGACY_NOT_SUPPORTED error code.

Out of scope

Acceptance criteria

  • Each version decoder produces byte-exact output on the vendored corpus (verified against upstream's reference legacy decompressor).
  • ZSTD_isFrame returns true for valid legacy magic + a complete frame header.
  • ZSTD_decompress transparently dispatches to legacy decoder on legacy magic.
  • ZSTD_isLegacy and ZSTD_decompressLegacy symbols exported; FFI snapshot test (symbols.rs) updated.
  • zstd -d CLI handles legacy archives without explicit flags.
  • cargo build --no-default-features (without legacy feature) excludes the legacy module and FFI returns "not supported" error code on legacy magic.
  • CI matrix covers both --features legacy and --no-default-features.

Estimate

~14-16 working days (~2000 LoC across 4 version decoders + dispatcher + FFI + fixture-based tests). Each version is its own format spec; the dispatcher is the only piece shared.

Blocked by

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2-mediumMedium priority — important improvementP3-lowLow priority — nice to haveenhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions