Skip to content

feat(pubdata): deflate-wrap the v2 pubdata body#666

Draft
0xVolosnikov wants to merge 6 commits into
vv/pubdata-compression-portfrom
vv/pubdata-deflate-measurement
Draft

feat(pubdata): deflate-wrap the v2 pubdata body#666
0xVolosnikov wants to merge 6 commits into
vv/pubdata-compression-portfrom
vv/pubdata-deflate-measurement

Conversation

@0xVolosnikov
Copy link
Copy Markdown
Contributor

@0xVolosnikov 0xVolosnikov commented May 18, 2026

What ❔

Stacked on #665. Folds a DEFLATE envelope into the still-draft v2
pubdata format — PUBDATA_ENCODING_VERSION stays at 2; only the body
encoding changes.

Layout after this PR:

v2: [VERSION:1=2][BLOCK_HASH:32][TIMESTAMP:8][DEFLATE(BODY)]

BODY is the byte stream the storage-diff + logs + messages emitters
already produce. The 41-byte header stays uncompressed so consumers
can read block_hash / timestamp without inflating.

Commits in this PR:

  1. Host-side deflate measurement — adds rig::pubdata_compression
    wrapping the vendored supporting_crates/nostd_compression
    deflate. Three measurement tests + eth_runner bench-marker now
    emits pubdata_bytes_deflate{1,9}.

  2. Wire deflate into STFwrite_pubdata buffers the body into
    an allocator-owned Vec<u8, A> via a small WriteBytes adapter,
    deflates it, and mirrors the compressed bytes to both pubdata_dst
    and result_keeper.pubdata(...).

    The three miniz scratch structs (HuffmanOxide, LocalBuf,
    HashBuffers, ~250 KB combined) are heap-allocated via
    Box::new_zeroed_in on the system allocator. Compression level
    fixed at 1 (fastest); level-1 lost only ~3 pp of ratio vs
    level 9 in the measurement.

  3. Live-allocator fix — clone io.allocator instead of using
    A::default(); the latter constructs a fresh handle whose backing
    memory is undefined in proving.

  4. Pre-size body Vecproof_running_system::ProxyAllocator
    panics on grow(). Body buffer pre-allocates BUFFER_CAPACITY
    (the same per-batch cap the v2 path already enforced via
    BlobCommitmentGenerator.buffer: ArrayVec<u8, BUFFER_CAPACITY>).

  5. Drop v3 bump — folds deflate into draft v2 instead of
    introducing a separate v3 version byte.

Why ❔

Tier-S #1 from the pubdata optimization notes — generic deflate over
the whole pubdata blob.

Host measurement results (uncompressed v2 baseline at deflate-9):

Workload Raw Deflate-1 Deflate-9
mint + transfer 208 B 213 (+2.4%) 213 (+2.4%)
mint + transfer + deploy + eoa + mint2 3763 B 1834 (-51%) 1694 (-55%)
20× ERC-20 transfers 1208 B 916 (-24%) 912 (-24%)

Confirmed end-to-end after STF wiring: emitted blob for the mixed
deploy block is 1826 B vs 3763 B raw v2 body → −51% in STF at
level 1
; re-deflating the emitted blob adds only ~5 B overhead,
confirming the body is at deflate's information limit.

Forward / proving alignment

miniz deflate is fully deterministic for fixed flags. The forward and
proving paths share the same write_pubdata and compression
parameters, so they emit identical bytes for identical inputs.

Is this a breaking change?

  • Yes
  • No

The on-wire body after the fixed 41-byte header changes from raw v2
bytes to DEFLATE(v2 bytes). External pubdata consumers (L1 verifier,
sequencer-side decoder) need to inflate the trailing body. Header
offsets ([0], [1..33], [33..41]) are unchanged. No version byte bump
— absorbed into still-draft v2.

Test coverage

  • inflate_roundtrip_mixed_block / inflate_roundtrip_minimal_block:
    host-side miniz_oxide inflate of captured pubdata, assert v2
    body shape (total_diffs >= nb_account_initial + nb_slot_initial,
    index_len in 1..=8).
  • measure_*: print emitted blob sizes + a re-deflate pass to confirm
    the body is already saturated (~5 B added).
  • test_check_pubdata_encoding_version / test_check_pubdata_has_timestamp:
    pass unchanged — header layout is preserved.
  • 69 transactions tests pass; 3 pre-existing failures
    (test_block_of_erc20, test_gas_price_zero_fee_{zero,one})
    are the same randomized-tree / index-encoding panic that exists on
    vv/pubdata-compression-port already.

Checklist

  • PR title corresponds to the body of PR.
  • Tests for the changes have been added (roundtrip + sanity).
  • Documentation comments added.
  • Code formatted.

🤖 Generated with Claude Code

@0xVolosnikov 0xVolosnikov changed the title feat(pubdata): host-side deflate measurement for v2 blob feat(pubdata): deflate-wrapped v3 pubdata blob + host measurement May 18, 2026
@0xVolosnikov 0xVolosnikov changed the title feat(pubdata): deflate-wrapped v3 pubdata blob + host measurement feat(pubdata): deflate-wrap the v2 pubdata body May 18, 2026
0xVolosnikov and others added 6 commits May 18, 2026 16:07
Wires the vendored `miniz_nostd_compression` deflate into the rig as
post-STF instrumentation: captures the v2 pubdata blob each block emits
and reports its deflated size at compression levels 1, 6, 9 and with the
zlib wrapper. No STF/protocol change — the emitted bytes are unchanged.

Use is twofold:
- Three new `tests/instances/transactions` tests print per-block
  compression ratios under `cargo test -- --nocapture` for quick local
  evaluation across small/mixed/storage-heavy workloads.
- `tests/instances/eth_runner/single_run.rs` appends
  `pubdata_bytes_deflate{1,9}: N` next to the existing
  `pubdata_bytes: N` in the cycle-marker bench file, so live
  mainnet-replay fixtures get measured automatically the next time
  benches run.

Motivation: evaluate Tier-S #1 ("generic compression on the whole
pubdata blob") from the optimization notes before committing to an
in-circuit envelope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps `PUBDATA_ENCODING_VERSION` to 3 and wraps the v2 body in a raw
DEFLATE stream emitted by `write_pubdata`:

  v3: [VERSION:1=3][BLOCK_HASH:32][TIMESTAMP:8][DEFLATE(BODY)]

`BODY` is the byte stream the v2 emitter already produces (storage
diffs + logs + messages). Header stays uncompressed so consumers can
read block_hash and timestamp without inflating.

Implementation:

- `basic_bootloader::bootloader::block_flow::zk::post_tx_op::pubdata_compression`
  — new module wrapping `miniz_nostd_compression`. Allocates the three
  large miniz scratch buffers (`HuffmanOxide`, `LocalBuf`,
  `HashBuffers`, ~250 KB combined) via `Box::new_zeroed_in` on the
  system allocator to avoid the stack copies that
  `Box::new_in(T::default(), …)` would incur. Output buffer sized to
  the zlib `deflateBound` upper bound.

- `write_pubdata` collects the v2 body into a single `Vec<u8, A>` via
  a `WriteBytes` adapter, then deflates it. The body emitters'
  result_keeper mirror is suppressed via `NopResultKeeper` during
  collection; the compressed bytes are mirrored to both real sinks
  after deflate.

- Compression level fixed at 1 (fastest). On the measurement
  workloads, level-1 lost only ~3 pp of ratio vs level 9 (48.7% vs
  45.0% on a deploy-heavy 3.7 KB block), and the cycle delta in
  RISC-V is expected to favour the fast path heavily. Level can be
  tuned later if cycle budgets allow.

Tests:

- `tests/instances/transactions/src/pubdata_compression_experiment.rs`
  gains two `inflate_roundtrip_*` tests that host-side `miniz_oxide`
  inflate the v3 body and assert the inflated bytes have a valid v2
  diffs header. The existing measure_* tests now also act as a sanity
  check that the STF body is already at deflate's information limit
  (a second pass adds ~5 B overhead).

- `test_check_pubdata_encoding_version` and
  `test_check_pubdata_has_timestamp` pass unchanged: header layout
  (offsets 0, 1..33, 33..41) is preserved.

Forward / proving alignment: miniz deflate is fully deterministic for
fixed flags; the two paths emit identical bytes for identical inputs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`A::default()` constructs a fresh allocator handle whose backing memory
is undefined in the proving environment — the RISC-V bump allocator's
state lives in `io.allocator`, not in `A::default()`. Using the
default-constructed handle let bench head trip an invalid operation
when v3 deflate scratch buffers were allocated against it.

Cloning `io.allocator` once up front, before any of the field borrows
that follow, keeps every allocation rooted in the system's actual heap.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… forbids grow)

`proof_running_system::ProxyAllocator::grow()` explicitly panics — the
proving heap allocator does not allow Vec reallocation. The body
buffer was created via `Vec::new_in(allocator)` (capacity 0) and grew
via repeated `extend_from_slice` calls in the body emitters, which
tripped grow and crashed bench head with "invalid operation".

Pre-allocate to `BUFFER_CAPACITY` (the same per-batch upper bound the
v2 path already enforced via `BlobCommitmentGenerator.buffer:
ArrayVec<u8, BUFFER_CAPACITY>`). Any v2 body that fit before
necessarily fits after deflate, so the cap is not a regression.

`pubdata_compression::deflate_pubdata_body` already pre-sizes its
output via `Vec::with_capacity_in(deflate_output_cap(input.len()),
allocator)` and only `resize` / `truncate` within that capacity, so
no other site needed adjusting.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
#665's v2 is still a draft format on `draft-0.4.0`, so the deflate
envelope is absorbed into v2 rather than introducing a separate v3.
`PUBDATA_ENCODING_VERSION` stays at 2 and the layout doc on the
constant now reads:

  Version 2: tree-index storage diffs + DEFLATE-wrapped body.
             Header (version + block_hash + timestamp) uncompressed.

No on-wire layout change — the bytes a recipient sees are identical
to the previous commit's emission, just with `pubdata[0] == 2`
instead of `3`. The two roundtrip tests and the existing version /
timestamp tests stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the buffer-then-deflate pattern in `write_pubdata` with a
streaming `DeflateSink`: the existing body emitters
(`apply_storage_diffs_pubdata`, `LogsStorage::apply_pubdata`) call
`DeflateSink::write(buf)` directly, miniz consumes the bytes
incrementally, and the emitted compressed bytes are forwarded to
both `pubdata_dst` (DA commitment) and `result_keeper` on the fly.

Peak per-block allocations drop from

  body Vec (BUFFER_CAPACITY = 1.1 MB)
  + output Vec (input + input/8 + 64 ≈ 1.2 MB worst case)
  + scratch (HuffmanOxide + LocalBuf + HashBuffers ≈ 254 KB)
  = ~2.55 MB

to

  out_buf (16 KB fixed)
  + scratch (~254 KB)
  = ~270 KB.

A ~9x reduction in peak compressor-stage memory. The 1.1 MB body
buffer was a pre-allocation forced by `proof_running_system::ProxyAllocator`'s
`grow`-panics policy — removing the resizable Vec also removes a
class of "what if the body exceeds the cap" worries.

Cycle impact is small but slightly positive on larger blocks: we
skip the body-bytes memcpy into the body Vec and the second memcpy
of the deflate output into pubdata_dst + result_keeper, at the cost
of more `compress()` invocations (one per body-emitter `write` call
plus drains every 16 KB). On the host workloads the emitted bytes
are byte-identical to the previous buffer-then-deflate path, so the
forward / proving alignment property holds.

Implementation notes:

- `DeflateSink<'a, A, DST, RK>` borrows the compressor (which itself
  borrows three scratch boxes the caller owns) and the two
  destination refs (DA sink + result keeper). `finish()` flushes
  the deflate stream end marker.
- Scratch boxes are allocated via `Box::new_zeroed_in` to avoid a
  250 KB stack copy — same pattern as before, just exposed as a
  reusable `boxed_zeroed_in` helper.
- Match arms in the compress-status handling are exhaustive
  (`Done` / `Okay` / `BadParam` / `PutBufFailed`) per project
  convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@0xVolosnikov 0xVolosnikov force-pushed the vv/pubdata-deflate-measurement branch from 8631195 to 28c0f11 Compare May 18, 2026 21:10
@github-actions
Copy link
Copy Markdown
Contributor

Block-level effective cycles

Benchmark Symbol Base Eff Head Eff (%) Base Raw Head Raw (%) Base Blake Head Blake (%) Base Bigint Head Bigint (%) Base Keccak Head Keccak (%)
block_19299001 (keccak DA) process_block 209,876,212 211,170,520 (+0.62%) 160,094,520 161,435,556 (+0.84%) 410,630 410,630 (+0.00%) 7,681,862 7,681,862 (+0.00%) 3,121,041 3,109,359 (-0.37%)
block_19299001 (blobs DA) process_block 259,393,669 260,230,702 (+0.32%) 197,618,037 198,604,422 (+0.50%) 412,000 411,610 (-0.09%) 10,716,403 10,680,625 (-0.33%) 3,079,505 3,079,505 (+0.00%)
block_22244135 (keccak DA) process_block 135,517,145 136,251,396 (+0.54%) 107,207,881 107,973,284 (+0.71%) 172,040 172,040 (+0.00%) 5,054,163 5,054,163 (+0.00%) 1,334,993 1,327,205 (-0.58%)
block_22244135 (blobs DA) process_block 184,770,837 185,591,406 (+0.44%) 144,487,437 145,333,730 (+0.59%) 172,740 172,480 (-0.15%) 8,066,314 8,060,923 (-0.07%) 1,313,576 1,313,576 (+0.00%)
Block-level sub-phases
Benchmark Symbol Base Eff Head Eff (%) Base Raw Head Raw (%) Base Blake Head Blake (%) Base Bigint Head Bigint (%) Base Keccak Head Keccak (%)
block_19299001 (blobs DA) blob_versioned_hash 49,889,801 49,234,857 (-1.31%) 37,729,717 37,224,125 (-1.34%) 1,370 980 (-28.47%) 3,034,541 2,998,763 (-1.18%) 0 0 (+0.00%)
block_22244135 (blobs DA) blob_versioned_hash 49,461,437 49,418,958 (-0.09%) 37,401,633 37,384,878 (-0.04%) 700 440 (-37.14%) 3,012,151 3,006,760 (-0.18%) 0 0 (+0.00%)
block_19299001 (blobs DA) da_commitment 3,572,429 5,059,581 (+41.63%) 3,483,469 4,970,621 (+42.69%) 5,560 5,560 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
block_19299001 (keccak DA) da_commitment 3,940,621 5,230,105 (+32.72%) 3,688,113 5,024,325 (+36.23%) 5,560 5,560 (+0.00%) 0 0 (+0.00%) 40,887 29,205 (-28.57%)
block_22244135 (blobs DA) da_commitment 2,157,838 3,019,073 (+39.91%) 2,106,318 2,967,553 (+40.89%) 3,220 3,220 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
block_22244135 (keccak DA) da_commitment 2,362,632 3,095,071 (+31.00%) 2,228,040 2,991,631 (+34.27%) 3,220 3,220 (+0.00%) 0 0 (+0.00%) 20,768 12,980 (-37.50%)
block_19299001 (keccak DA) run_tx_loop 189,799,175 189,799,175 (+0.00%) 141,712,823 141,712,823 (+0.00%) 316,840 316,840 (+0.00%) 7,681,862 7,681,862 (+0.00%) 3,072,366 3,072,366 (+0.00%)
block_22244135 (keccak DA) run_tx_loop 123,371,083 123,371,083 (+0.00%) 96,083,883 96,083,883 (+0.00%) 115,300 115,300 (+0.00%) 5,054,163 5,054,163 (+0.00%) 1,306,437 1,306,437 (+0.00%)
block_19299001 (blobs DA) state_commitment_update 12,969,877 12,981,026 (+0.09%) 11,844,917 11,856,066 (+0.09%) 70,310 70,310 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
block_19299001 (keccak DA) state_commitment_update 12,969,829 12,980,978 (+0.09%) 11,844,869 11,856,018 (+0.09%) 70,310 70,310 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
block_22244135 (blobs DA) state_commitment_update 7,454,392 7,459,117 (+0.06%) 6,801,272 6,805,997 (+0.07%) 40,820 40,820 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
block_22244135 (keccak DA) state_commitment_update 7,453,030 7,457,755 (+0.06%) 6,799,910 6,804,635 (+0.07%) 40,820 40,820 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
block_19299001 (keccak DA) system_init 36,738 36,738 (+0.00%) 36,738 36,738 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
block_22244135 (keccak DA) system_init 36,738 36,738 (+0.00%) 36,738 36,738 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
Precompiles test-crate bench (synthetic workload, all labels)
Benchmark Symbol Base Eff Head Eff (%) Base Raw Head Raw (%) Base Blake Head Blake (%) Base Bigint Head Bigint (%) Base Keccak Head Keccak (%)
precompiles bn254_ecadd 53,315 53,315 (+0.00%) 47,863 47,863 (+0.00%) 0 0 (+0.00%) 1,363 1,363 (+0.00%) 0 0 (+0.00%)
precompiles bn254_ecmul 731,892 731,892 (+0.00%) 567,704 567,704 (+0.00%) 0 0 (+0.00%) 41,047 41,047 (+0.00%) 0 0 (+0.00%)
precompiles bn254_pairing 71,468,694 71,468,694 (+0.00%) 56,940,550 56,940,550 (+0.00%) 0 0 (+0.00%) 3,632,036 3,632,036 (+0.00%) 0 0 (+0.00%)
precompiles da_commitment 27,584 247,864 (+798.58%) 24,668 244,948 (+892.98%) 20 20 (+0.00%) 0 0 (+0.00%) 649 649 (+0.00%)
precompiles ecrecover 370,906 368,725 (-0.59%) 242,054 240,541 (-0.63%) 0 0 (+0.00%) 31,564 31,397 (-0.53%) 649 649 (+0.00%)
precompiles id 925 925 (+0.00%) 925 925 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
precompiles keccak 31,670 31,670 (+0.00%) 10,898 10,898 (+0.00%) 0 0 (+0.00%) 1 1 (+0.00%) 5,192 5,192 (+0.00%)
precompiles modexp 31,888,536 31,888,536 (+0.00%) 21,230,716 21,230,716 (+0.00%) 0 0 (+0.00%) 2,664,455 2,664,455 (+0.00%) 0 0 (+0.00%)
precompiles p256_verify 747,278 747,278 (+0.00%) 468,586 468,586 (+0.00%) 0 0 (+0.00%) 69,673 69,673 (+0.00%) 0 0 (+0.00%)
precompiles process_block 144,548,985 144,784,470 (+0.16%) 114,954,241 115,184,418 (+0.20%) 5,360 5,360 (+0.00%) 7,325,326 7,326,653 (+0.02%) 51,920 51,920 (+0.00%)
precompiles process_transaction 72,054,804 72,063,705 (+0.01%) 57,305,732 57,317,993 (+0.02%) 160 160 (+0.00%) 3,664,562 3,663,722 (-0.02%) 22,066 22,066 (+0.00%)
precompiles ripemd 8,010 8,010 (+0.00%) 8,010 8,010 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
precompiles run_tx_loop 144,070,092 144,079,047 (+0.01%) 114,576,716 114,595,623 (+0.02%) 180 180 (+0.00%) 7,329,141 7,326,653 (-0.03%) 43,483 43,483 (+0.00%)
precompiles sha256 13,315 13,315 (+0.00%) 13,315 13,315 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
precompiles state_commitment_update 188,044 187,941 (-0.05%) 148,364 148,261 (-0.07%) 2,480 2,480 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)
precompiles system_init 41,465 41,465 (+0.00%) 41,465 41,465 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%) 0 0 (+0.00%)

Pubdata bytes

Benchmark Base Head (Δ%)
block_19299001 8,673 6,175 (-28.80%)
block_22244135 4,357 2,729 (-37.37%)

Per-precompile

Per-precompile per-execution ratios (head)
cycles = effective (raw + Blake×16 + BigInt×4 + Keccak×4)
precompile                count    med c/g    p95 c/g    p99 c/g    max c/g    med n/g    p95 n/g    p99 n/g    max n/g
------------------------------------------------------------------------------------------------------------------------
modexp                      105       70.6      713.4     2846.8     2847.5      300.0     1200.3     4814.0     4814.0
point_eval                    2     1025.1     1025.1     1025.1     1025.1     1262.1     1262.1     1262.1     1262.1
blake2f                       2      803.7      803.7      803.7      803.7        0.0        0.0        0.0        0.0
ecadd                        57      335.9      358.4      360.0      360.0      350.7      350.7      350.7      350.7
bls12_pairing_check           2      217.2      217.2      217.2      217.2        0.0        0.0        0.0        0.0
ecpairing                    31      168.4      185.6      185.6      185.6      398.2      428.6      428.6      428.6
keccak                     2497      111.7      126.6      139.3      150.6      478.8      558.6      626.8      684.2
ecmul                        37      119.0      124.1      126.5      126.5      127.3      127.3      127.3      127.3
ecrecover                    59      119.1      122.3      123.5      123.5      174.0      174.0      174.0      174.0
sha256                        4       68.4      123.3      123.3      123.3       80.6      131.5      131.5      131.5
p256_verify                  16      107.3      108.3      108.3      108.3      113.6      113.6      113.6      113.6
bls12_g1msm                   2      100.3      100.3      100.3      100.3        0.0        0.0        0.0        0.0
bls12_g2msm                   2       88.1       88.1       88.1       88.1        0.0        0.0        0.0        0.0
bls12_g2add                   2       45.0       45.0       45.0       45.0        0.0        0.0        0.0        0.0
identity                      5       22.7       34.3       34.3       34.3       31.4       48.1       48.1       48.1
bls12_g1add                   2       28.1       28.1       28.1       28.1        0.0        0.0        0.0        0.0
ripemd160                     4        4.4        7.4        7.4        7.4        8.1       13.1       13.1       13.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant