refactor: pluggable wire encoders and remove Pipeline::Lzr by ChrisLundquist · Pull Request #118 · ChrisLundquist/libpz

ChrisLundquist · 2026-03-10T09:03:37Z

Summary

Pluggable wire encoders: New TokenEncoder trait with 3 implementations (Lz77Encoder, LzSeqEncoder, LzssEncoder). Match finding is decoupled from wire encoding via a universal LzToken type and tokenize() entry point.
Remove Pipeline::Lzr: After the wire encoder refactor, Lzr (LZ77 + rANS) was identical to LzSeqR. Pipeline ID 3 is reserved with a tombstone comment to prevent reuse.
Upgrade Lzf wire encoding: Lzf switched from LzDemuxer::Lz77 (3 streams, ~41% ratio) to LzDemuxer::LzSeq (6 streams, ~32% ratio).
Simplify SortLz: Replaced hand-rolled FSE stream management with LzSeqEncoder, removing ~164 lines of duplicate encode/decode logic. Wire format v2.
max_match_len propagation: Non-Deflate LZ pipelines auto-default to u16::MAX via adjusted_options(), threaded through SeqConfig → HashChainFinder.

Key files

File	Change
`src/lz_token.rs`	NEW — `LzToken`, `EncodedStreams`, `TokenEncoder` trait, 3 encoder impls
`src/lzseq/mod.rs`	`encode_from_tokens()`, `max_match_len` in `SeqConfig`
`src/pipeline/demux.rs`	`encoder_for_demuxer()` dispatch, `demux_lz77_matches` returns `PzResult`
`src/pipeline/mod.rs`	`tokenize()` replaces `lz77_matches_with_backend()`, Lzr removed
`src/sortlz.rs`	Uses `LzSeqEncoder` for wire encoding (-164 lines)
27 more files	Lzr removal from tests, CLI, benchmarks, scripts, examples, fuzz

Test plan

cargo clippy --all-targets — zero warnings
cargo test — 706 tests pass, 0 failures
Pre-commit hooks pass (fmt, clippy, test)
Benchmarked: Lzr/Lzf ratio improved from ~41% to ~32%, throughput unchanged
Verify no regressions on CI

🤖 Generated with Claude Code

Two interleaved changes: 1. Pluggable wire encoders: New `src/lz_token.rs` with universal `LzToken` type, `TokenEncoder` trait, and three encoder implementations: - `Lz77Encoder`: DEFLATE-compatible 3-stream format - `LzSeqEncoder`: log2-coded 6-stream format (best ratio) - `LzssEncoder`: flag-based 4-stream format Match finders now produce `Vec<LzToken>` via `tokenize()`, and encoders convert token streams to independent byte streams for entropy coding. This decouples match finding from wire encoding. 2. Remove Pipeline::Lzr: After the wire encoder refactor, Lzr became identical to LzSeqR (same demuxer, match finder, wire encoder, and entropy coder). Removed from enum, dispatch tables, CLI, tests, benchmarks, examples, scripts, and fuzz targets. Pipeline ID 3 reserved with tombstone comment. Additionally, Lzf's demuxer switches from Lz77 to LzSeq, upgrading its compression ratio from ~41% to ~32% on typical data. Wire format break (pre-1.0): SortLz now uses LzSeq-encoded streams + FSE instead of hand-rolled flag/offset/length FSE streams. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add debug_assert for input_pos bounds in Lz77Encoder::encode - Wire SeqConfig.max_match_len through encode_from_tokens - Return PzResult from demux_lz77_matches instead of panicking Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add architecture section documenting the unified token pipeline (PR #118), active/removed pipelines table, and Silesia corpus benchmark data. Update project layout to reflect lz_token.rs and removed modules. Update dead ends with streaming path bottleneck finding and LzSeqR routing bug (PR #120). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bench: enable parallel, large, and webgpu benchmarks for Lzfi Lzfi was only benchmarked on the small Canterbury corpus with no parallel, large-file, or WebGPU variants. Enable all modes to match the LzSeqR and Lzf benchmark coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: update CLAUDE.md with architecture overview and Silesia benchmarks Add architecture section documenting the unified token pipeline (PR #118), active/removed pipelines table, and Silesia corpus benchmark data. Update project layout to reflect lz_token.rs and removed modules. Update dead ends with streaming path bottleneck finding and LzSeqR routing bug (PR #120). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

ChrisLundquist and others added 2 commits March 10, 2026 01:52

ChrisLundquist merged commit d6736d6 into master Mar 10, 2026
4 checks passed

ChrisLundquist mentioned this pull request Mar 10, 2026

refactor: post-wire-encoder cleanup and documentation #119

Merged

4 tasks

ChrisLundquist mentioned this pull request Mar 12, 2026

docs: update CLAUDE.md and enable Lzfi benchmarks #123

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: pluggable wire encoders and remove Pipeline::Lzr#118

refactor: pluggable wire encoders and remove Pipeline::Lzr#118
ChrisLundquist merged 2 commits into
masterfrom
claude/recursing-vaughan

ChrisLundquist commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ChrisLundquist commented Mar 10, 2026

Summary

Key files

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant