Summary
Phase 6.5 of #28: implement legacy frame format decoders for zstd versions v0.1, v0.2, v0.3, v0.4 so our libzstd.so.1 can read archives produced by any of those format versions. Upstream's lib/legacy/ directory ships dedicated decoders per version; Alpine's zstd package includes them by default, and removing them breaks any consumer that holds onto pre-v1.0 archives.
Sibling 6.6 (#66) covers v0.5/v0.6/v0.7 — split for issue sizing (each version is its own format spec with its own block layout / entropy tables / dictionary mode).
Why legacy decoders matter for drop-in
- Alpine
zstd-1.5.7-r2 builds with ZSTD_LEGACY_SUPPORT=4 (decode v0.4+) or higher; any binary consumer linking libzstd.so.1 and feeding it a v0.1-v0.7 archive expects it to work.
- Without legacy decoders,
ZSTD_isFrame / ZSTD_getFrameContentSize return wrong values on legacy magic numbers, and ZSTD_decompress errors out instead of dispatching to the right decoder.
- Reverse-dep impact: ZFS-on-Linux, btrfs userspace, some package managers, and historical archives.
Format versions covered by this issue
| Version |
Year |
Magic |
Notable differences from v1.0 |
| v0.1 |
2014 |
0xFD2FB51E |
Initial format. Fixed block size. No FSE — Huffman-only literals. |
| v0.2 |
2015 |
0xFD2FB522 |
FSE introduced (sequence-coding). Block size up to 128 KiB. |
| v0.3 |
2015 |
0xFD2FB523 |
Improved entropy stage. Frame header refactor. |
| v0.4 |
2015 |
0xFD2FB524 |
Dictionary IDs introduced. Multi-checksum field. |
Each version has its own:
- Frame header layout (different field sizes, different reserved bits)
- Block header semantics (the
Block_Maximum_Size evolved)
- Literal section encoder (Huffman variants, raw/RLE/repeat modes added incrementally)
- Sequence section encoder (FSE table layout differs)
- Repeat-offset state machine
Donor parity reference: lib/legacy/zstd_v01.{c,h} ... zstd_v04.{c,h} in upstream zstd v1.5.7.
Deliverables
1. New crate module decoding/legacy/
decoding/
└── legacy/
├── mod.rs — dispatcher: magic → version → decoder
├── v01.rs — v0.1 decoder (~500 LoC)
├── v02.rs — v0.2 decoder (~550 LoC)
├── v03.rs — v0.3 decoder (~500 LoC)
└── v04.rs — v0.4 decoder (~550 LoC)
2. Magic-number dispatcher
decoding/frame_decoder.rs peek_frame_magic detects v0.1-v0.7 magic numbers and routes to the right decoder. Existing v1.0+ path unchanged when magic is current (0xFD2FB528).
3. C FFI surface
ZSTD_isFrame(buf, len) recognizes legacy magic.
ZSTD_getFrameContentSize(buf, len) returns the value the legacy header carries (or ZSTD_CONTENTSIZE_UNKNOWN for versions that don't store it).
ZSTD_decompress(...) auto-dispatches to legacy decoder via the magic check.
ZSTD_decompressDCtx(...) and ZSTD_decompressStream likewise.
- New (from upstream):
ZSTD_isLegacy(buf, len), ZSTD_getDecompressedSize_legacy(...), ZSTD_decompressLegacy(...) — exported so consumers that opt in via the legacy macros still link.
4. Tests
decoding/legacy/tests/v01_corpus.rs — decode a corpus of v0.1 frames produced by upstream zstd_v01_compress (vendored test fixtures). Assert byte-exact output vs upstream's reference decompression.
- Same for v0.2, v0.3, v0.4 (one fixture file per version).
cli/tests/legacy_dispatch.rs — zstd -d FILE.zst where FILE is a v0.3 archive succeeds and produces the expected output.
- ABI test:
ZSTD_isLegacy symbol exported in c-api/tests/symbols.rs.
5. Build-time gate
Match upstream's ZSTD_LEGACY_SUPPORT macro:
- Cargo feature
legacy (default: enabled) wraps the entire decoding/legacy/ module.
- When disabled, magic-number dispatcher returns a "legacy support not compiled in" error matching upstream's
ZSTD_LEGACY_NOT_SUPPORTED error code.
Out of scope
Acceptance criteria
Estimate
~14-16 working days (~2000 LoC across 4 version decoders + dispatcher + FFI + fixture-based tests). Each version is its own format spec; the dispatcher is the only piece shared.
Blocked by
References
Summary
Phase 6.5 of #28: implement legacy frame format decoders for zstd versions v0.1, v0.2, v0.3, v0.4 so our
libzstd.so.1can read archives produced by any of those format versions. Upstream'slib/legacy/directory ships dedicated decoders per version; Alpine'szstdpackage includes them by default, and removing them breaks any consumer that holds onto pre-v1.0 archives.Sibling 6.6 (#66) covers v0.5/v0.6/v0.7 — split for issue sizing (each version is its own format spec with its own block layout / entropy tables / dictionary mode).
Why legacy decoders matter for drop-in
zstd-1.5.7-r2builds withZSTD_LEGACY_SUPPORT=4(decode v0.4+) or higher; any binary consumer linkinglibzstd.so.1and feeding it a v0.1-v0.7 archive expects it to work.ZSTD_isFrame/ZSTD_getFrameContentSizereturn wrong values on legacy magic numbers, andZSTD_decompresserrors out instead of dispatching to the right decoder.Format versions covered by this issue
0xFD2FB51E0xFD2FB5220xFD2FB5230xFD2FB524Each version has its own:
Block_Maximum_Sizeevolved)Donor parity reference:
lib/legacy/zstd_v01.{c,h}...zstd_v04.{c,h}in upstream zstd v1.5.7.Deliverables
1. New crate module
decoding/legacy/2. Magic-number dispatcher
decoding/frame_decoder.rspeek_frame_magicdetects v0.1-v0.7 magic numbers and routes to the right decoder. Existing v1.0+ path unchanged when magic is current (0xFD2FB528).3. C FFI surface
ZSTD_isFrame(buf, len)recognizes legacy magic.ZSTD_getFrameContentSize(buf, len)returns the value the legacy header carries (orZSTD_CONTENTSIZE_UNKNOWNfor versions that don't store it).ZSTD_decompress(...)auto-dispatches to legacy decoder via the magic check.ZSTD_decompressDCtx(...)andZSTD_decompressStreamlikewise.ZSTD_isLegacy(buf, len),ZSTD_getDecompressedSize_legacy(...),ZSTD_decompressLegacy(...)— exported so consumers that opt in via the legacy macros still link.4. Tests
decoding/legacy/tests/v01_corpus.rs— decode a corpus of v0.1 frames produced by upstreamzstd_v01_compress(vendored test fixtures). Assert byte-exact output vs upstream's reference decompression.cli/tests/legacy_dispatch.rs—zstd -d FILE.zstwhere FILE is a v0.3 archive succeeds and produces the expected output.ZSTD_isLegacysymbol exported inc-api/tests/symbols.rs.5. Build-time gate
Match upstream's
ZSTD_LEGACY_SUPPORTmacro:legacy(default: enabled) wraps the entiredecoding/legacy/module.ZSTD_LEGACY_NOT_SUPPORTEDerror code.Out of scope
Acceptance criteria
ZSTD_isFramereturns true for valid legacy magic + a complete frame header.ZSTD_decompresstransparently dispatches to legacy decoder on legacy magic.ZSTD_isLegacyandZSTD_decompressLegacysymbols exported; FFI snapshot test (symbols.rs) updated.zstd -dCLI handles legacy archives without explicit flags.cargo build --no-default-features(withoutlegacyfeature) excludes the legacy module and FFI returns "not supported" error code on legacy magic.--features legacyand--no-default-features.Estimate
~14-16 working days (~2000 LoC across 4 version decoders + dispatcher + FFI + fixture-based tests). Each version is its own format spec; the dispatcher is the only piece shared.
Blocked by
ZSTD_isFrame,ZSTD_isLegacy).ZSTD_decompresswhich is in 6.1.References
lib/legacy/zstd_v0{1..4}.{c,h}v1.5.7