feat(pubdata): deflate-wrap the v2 pubdata body#666
Draft
0xVolosnikov wants to merge 6 commits into
Draft
Conversation
Wires the vendored `miniz_nostd_compression` deflate into the rig as
post-STF instrumentation: captures the v2 pubdata blob each block emits
and reports its deflated size at compression levels 1, 6, 9 and with the
zlib wrapper. No STF/protocol change — the emitted bytes are unchanged.
Use is twofold:
- Three new `tests/instances/transactions` tests print per-block
compression ratios under `cargo test -- --nocapture` for quick local
evaluation across small/mixed/storage-heavy workloads.
- `tests/instances/eth_runner/single_run.rs` appends
`pubdata_bytes_deflate{1,9}: N` next to the existing
`pubdata_bytes: N` in the cycle-marker bench file, so live
mainnet-replay fixtures get measured automatically the next time
benches run.
Motivation: evaluate Tier-S #1 ("generic compression on the whole
pubdata blob") from the optimization notes before committing to an
in-circuit envelope.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps `PUBDATA_ENCODING_VERSION` to 3 and wraps the v2 body in a raw DEFLATE stream emitted by `write_pubdata`: v3: [VERSION:1=3][BLOCK_HASH:32][TIMESTAMP:8][DEFLATE(BODY)] `BODY` is the byte stream the v2 emitter already produces (storage diffs + logs + messages). Header stays uncompressed so consumers can read block_hash and timestamp without inflating. Implementation: - `basic_bootloader::bootloader::block_flow::zk::post_tx_op::pubdata_compression` — new module wrapping `miniz_nostd_compression`. Allocates the three large miniz scratch buffers (`HuffmanOxide`, `LocalBuf`, `HashBuffers`, ~250 KB combined) via `Box::new_zeroed_in` on the system allocator to avoid the stack copies that `Box::new_in(T::default(), …)` would incur. Output buffer sized to the zlib `deflateBound` upper bound. - `write_pubdata` collects the v2 body into a single `Vec<u8, A>` via a `WriteBytes` adapter, then deflates it. The body emitters' result_keeper mirror is suppressed via `NopResultKeeper` during collection; the compressed bytes are mirrored to both real sinks after deflate. - Compression level fixed at 1 (fastest). On the measurement workloads, level-1 lost only ~3 pp of ratio vs level 9 (48.7% vs 45.0% on a deploy-heavy 3.7 KB block), and the cycle delta in RISC-V is expected to favour the fast path heavily. Level can be tuned later if cycle budgets allow. Tests: - `tests/instances/transactions/src/pubdata_compression_experiment.rs` gains two `inflate_roundtrip_*` tests that host-side `miniz_oxide` inflate the v3 body and assert the inflated bytes have a valid v2 diffs header. The existing measure_* tests now also act as a sanity check that the STF body is already at deflate's information limit (a second pass adds ~5 B overhead). - `test_check_pubdata_encoding_version` and `test_check_pubdata_has_timestamp` pass unchanged: header layout (offsets 0, 1..33, 33..41) is preserved. Forward / proving alignment: miniz deflate is fully deterministic for fixed flags; the two paths emit identical bytes for identical inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`A::default()` constructs a fresh allocator handle whose backing memory is undefined in the proving environment — the RISC-V bump allocator's state lives in `io.allocator`, not in `A::default()`. Using the default-constructed handle let bench head trip an invalid operation when v3 deflate scratch buffers were allocated against it. Cloning `io.allocator` once up front, before any of the field borrows that follow, keeps every allocation rooted in the system's actual heap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… forbids grow) `proof_running_system::ProxyAllocator::grow()` explicitly panics — the proving heap allocator does not allow Vec reallocation. The body buffer was created via `Vec::new_in(allocator)` (capacity 0) and grew via repeated `extend_from_slice` calls in the body emitters, which tripped grow and crashed bench head with "invalid operation". Pre-allocate to `BUFFER_CAPACITY` (the same per-batch upper bound the v2 path already enforced via `BlobCommitmentGenerator.buffer: ArrayVec<u8, BUFFER_CAPACITY>`). Any v2 body that fit before necessarily fits after deflate, so the cap is not a regression. `pubdata_compression::deflate_pubdata_body` already pre-sizes its output via `Vec::with_capacity_in(deflate_output_cap(input.len()), allocator)` and only `resize` / `truncate` within that capacity, so no other site needed adjusting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#665's v2 is still a draft format on `draft-0.4.0`, so the deflate envelope is absorbed into v2 rather than introducing a separate v3. `PUBDATA_ENCODING_VERSION` stays at 2 and the layout doc on the constant now reads: Version 2: tree-index storage diffs + DEFLATE-wrapped body. Header (version + block_hash + timestamp) uncompressed. No on-wire layout change — the bytes a recipient sees are identical to the previous commit's emission, just with `pubdata[0] == 2` instead of `3`. The two roundtrip tests and the existing version / timestamp tests stay green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the buffer-then-deflate pattern in `write_pubdata` with a streaming `DeflateSink`: the existing body emitters (`apply_storage_diffs_pubdata`, `LogsStorage::apply_pubdata`) call `DeflateSink::write(buf)` directly, miniz consumes the bytes incrementally, and the emitted compressed bytes are forwarded to both `pubdata_dst` (DA commitment) and `result_keeper` on the fly. Peak per-block allocations drop from body Vec (BUFFER_CAPACITY = 1.1 MB) + output Vec (input + input/8 + 64 ≈ 1.2 MB worst case) + scratch (HuffmanOxide + LocalBuf + HashBuffers ≈ 254 KB) = ~2.55 MB to out_buf (16 KB fixed) + scratch (~254 KB) = ~270 KB. A ~9x reduction in peak compressor-stage memory. The 1.1 MB body buffer was a pre-allocation forced by `proof_running_system::ProxyAllocator`'s `grow`-panics policy — removing the resizable Vec also removes a class of "what if the body exceeds the cap" worries. Cycle impact is small but slightly positive on larger blocks: we skip the body-bytes memcpy into the body Vec and the second memcpy of the deflate output into pubdata_dst + result_keeper, at the cost of more `compress()` invocations (one per body-emitter `write` call plus drains every 16 KB). On the host workloads the emitted bytes are byte-identical to the previous buffer-then-deflate path, so the forward / proving alignment property holds. Implementation notes: - `DeflateSink<'a, A, DST, RK>` borrows the compressor (which itself borrows three scratch boxes the caller owns) and the two destination refs (DA sink + result keeper). `finish()` flushes the deflate stream end marker. - Scratch boxes are allocated via `Box::new_zeroed_in` to avoid a 250 KB stack copy — same pattern as before, just exposed as a reusable `boxed_zeroed_in` helper. - Match arms in the compress-status handling are exhaustive (`Done` / `Okay` / `BadParam` / `PutBufFailed`) per project convention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8631195 to
28c0f11
Compare
Contributor
Block-level effective cycles
Block-level sub-phases
Precompiles test-crate bench (synthetic workload, all labels)
Pubdata bytes
Per-precompilePer-precompile per-execution ratios (head) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What ❔
Stacked on #665. Folds a DEFLATE envelope into the still-draft v2
pubdata format —
PUBDATA_ENCODING_VERSIONstays at 2; only the bodyencoding changes.
Layout after this PR:
BODYis the byte stream the storage-diff + logs + messages emittersalready produce. The 41-byte header stays uncompressed so consumers
can read block_hash / timestamp without inflating.
Commits in this PR:
Host-side deflate measurement — adds
rig::pubdata_compressionwrapping the vendored
supporting_crates/nostd_compressiondeflate. Three measurement tests +
eth_runnerbench-marker nowemits
pubdata_bytes_deflate{1,9}.Wire deflate into STF —
write_pubdatabuffers the body intoan allocator-owned
Vec<u8, A>via a smallWriteBytesadapter,deflates it, and mirrors the compressed bytes to both
pubdata_dstand
result_keeper.pubdata(...).The three miniz scratch structs (
HuffmanOxide,LocalBuf,HashBuffers, ~250 KB combined) are heap-allocated viaBox::new_zeroed_inon the system allocator. Compression levelfixed at 1 (fastest); level-1 lost only ~3 pp of ratio vs
level 9 in the measurement.
Live-allocator fix — clone
io.allocatorinstead of usingA::default(); the latter constructs a fresh handle whose backingmemory is undefined in proving.
Pre-size body Vec —
proof_running_system::ProxyAllocatorpanics on
grow(). Body buffer pre-allocatesBUFFER_CAPACITY(the same per-batch cap the v2 path already enforced via
BlobCommitmentGenerator.buffer: ArrayVec<u8, BUFFER_CAPACITY>).Drop v3 bump — folds deflate into draft v2 instead of
introducing a separate v3 version byte.
Why ❔
Tier-S #1 from the pubdata optimization notes — generic deflate over
the whole pubdata blob.
Host measurement results (uncompressed v2 baseline at deflate-9):
Confirmed end-to-end after STF wiring: emitted blob for the mixed
deploy block is 1826 B vs 3763 B raw v2 body → −51% in STF at
level 1; re-deflating the emitted blob adds only ~5 B overhead,
confirming the body is at deflate's information limit.
Forward / proving alignment
miniz deflate is fully deterministic for fixed flags. The forward and
proving paths share the same
write_pubdataand compressionparameters, so they emit identical bytes for identical inputs.
Is this a breaking change?
The on-wire body after the fixed 41-byte header changes from raw v2
bytes to DEFLATE(v2 bytes). External pubdata consumers (L1 verifier,
sequencer-side decoder) need to inflate the trailing body. Header
offsets ([0], [1..33], [33..41]) are unchanged. No version byte bump
— absorbed into still-draft v2.
Test coverage
inflate_roundtrip_mixed_block/inflate_roundtrip_minimal_block:host-side
miniz_oxideinflate of captured pubdata, assert v2body shape (
total_diffs >= nb_account_initial + nb_slot_initial,index_len in 1..=8).measure_*: print emitted blob sizes + a re-deflate pass to confirmthe body is already saturated (~5 B added).
test_check_pubdata_encoding_version/test_check_pubdata_has_timestamp:pass unchanged — header layout is preserved.
(
test_block_of_erc20,test_gas_price_zero_fee_{zero,one})are the same randomized-tree / index-encoding panic that exists on
vv/pubdata-compression-port already.
Checklist
🤖 Generated with Claude Code