Skip to content

LongMeng-Crypto/Brakedown

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shockwave Polynomial Commitment

This workspace implements the square-matrix t = 2 multilinear polynomial commitment from Brakedown, instantiated with coset Reed-Solomon encoding as in the Shockwave variant.

The implementation uses:

  • ICICLE v4.0.0 BN254 ScalarField as the only field representation.
  • ICICLE coset INTT/NTT for Reed-Solomon interpolation and evaluation.
  • ICICLE Keccak-256 for Merkle leaves.
  • ICICLE Blake2s for Merkle internal nodes.
  • CPU Keccak-256 for the small Fiat-Shamir transcript.
  • One complete encoded matrix column per Merkle leaf.
  • Independent testing and evaluation query sets.
  • One Merkle opening per unique column in the union of both query sets.
  • CPU execution or a CUDA path for NTT, Merkle hashing, large Prove matrix products, and Verify RS/consistency checks.

See ICICLE_INTEGRATION.md for the concrete ICICLE Rust APIs, memory layout, backend setup, and official source locations.

Code Structure

crates/pcs/src/
  field.rs          ICICLE BN254 field helpers and canonical challenge reduction
  polynomial.rs     multilinear evaluations, eq vectors, and q_row^T U q_col
  reed_solomon.rs   CPU/CUDA batched coset INTT/NTT Reed-Solomon encoding
  params.rs         validated Shockwave dimensions and query count
  encoding.rs       column-major encoded matrix and retained CUDA buffers
  serialization.rs  canonical fixed-width field serialization
  commitment.rs     batched ICICLE Keccak/Blake2s Merkle commitment
  transcript.rs     CPU Keccak-256 Fiat-Shamir transcript
  shockwave.rs      Setup, Commit, Open, ProveEvaluation, VerifyEvaluation

crates/cli/src/main.rs
  demo and small/medium/large/xlarge benchmark commands

Protocol Workflow

For ell multilinear variables:

n = 2^ell
m = 2^(ell/2)
U is the m x m reshape of the polynomial evaluation vector
encoded_cols = m * rho_inv

The current implementation requires even ell, a square t = 2 matrix, and a power-of-two rho_inv.

Setup

Entry point:

ShockwavePcs::setup(num_vars, rho_inv, num_queries, backend)

Call flow:

ShockwavePcs::setup (crates/pcs/src/shockwave.rs:127)
  -> ShockwaveParams::new (crates/pcs/src/params.rs:35)
       validates ell, rho_inv, and num_queries
       derives n, rows, cols, and encoded_cols
  -> initialize_icicle (crates/pcs/src/commitment.rs:385)
       loads CUDA once when requested
       selects the active CPU or CUDA device
  -> initialize_ntt_domain (crates/pcs/src/reed_solomon.rs:40)
       creates or enlarges the per-device NTT domain
       reuses a larger initialized domain for smaller transforms

CUDA setup currently initializes matching domains for both CUDA and the CPU fallback. Fiat-Shamir and Merkle path verification use direct Rust CPU hashing, while CUDA Verify uses the CUDA NTT and matrix-operation paths.

Commit

Entry point:

pcs.commit(&poly)

Call flow:

ShockwavePcs::commit (crates/pcs/src/shockwave.rs:155)
  -> encode_matrix (crates/pcs/src/encoding.rs:128)
  -> rs_encode_matrix (crates/pcs/src/reed_solomon.rs:68)
       CPU -> encode_on_cpu (crates/pcs/src/reed_solomon.rs:124)
       CUDA -> encode_on_cuda (crates/pcs/src/reed_solomon.rs:166)
  -> build_column_commitment (crates/pcs/src/commitment.rs:420)
  -> ColumnMerkleTree::root (crates/pcs/src/commitment.rs:143)

The Reed-Solomon transform is:

each row U[i]
  -> coset INTT over H, offset 5
  -> zero-pad coefficients from m to encoded_cols
  -> coset NTT over L, offset 7
  -> encoded row U_hat[i]

CPU Commit executes all rows as one ICICLE batch and then converts the encoded matrix to column-major host storage.

CUDA Commit keeps the large data path on the GPU:

upload row-major U once
  -> batch CUDA coset INTT
  -> CUDA transpose-based zero-padding layout
  -> batch CUDA coset NTT
  -> CUDA transpose to column-major U_hat
  -> batched CUDA Keccak leaf hashing
  -> batched CUDA Blake2s Merkle layers
  -> 32-byte root

The column-major encoded matrix is copied to host once for future openings. CUDA mode also retains:

  • the original row-major polynomial matrix for large Prove matrix products;
  • the encoded column-major matrix for direct Merkle leaf hashing.

For polynomial evaluations u:

u in F^n
U in F^(m x m)
U_hat[i] = RS(U[i])
U_hat in F^(m x encoded_cols)
commitment = MerkleRoot(columns(U_hat))

Full Polynomial Open

Entry point:

pcs.open_polynomial(&commitment, &poly)

Call flow:

ShockwavePcs::open_polynomial (crates/pcs/src/shockwave.rs:168)
  -> validate_commitment
  -> ShockwavePcs::commit
  -> compare the recomputed root with the supplied commitment

This is the complete polynomial opening. It is separate from the succinct evaluation proof below.

Evaluation Opening

Entry point:

pcs.prove_evaluation(&poly, &state, &point)

Call flow:

ShockwavePcs::prove_evaluation (crates/pcs/src/shockwave.rs:179)
  -> prove_evaluation_with_backend (crates/pcs/src/shockwave.rs:189)
  -> q1_q2_from_r (crates/pcs/src/polynomial.rs:100)
       derives q_row and q_col
  -> ShockwavePcs::combine_rows (crates/pcs/src/shockwave.rs:335)
       evaluation_response = q_row^T U
  -> inner_product (crates/pcs/src/shockwave.rs:493)
       y = <evaluation_response, q_col>
  -> Transcript::challenge_fields (crates/pcs/src/transcript.rs:67)
       derives testing vector alpha
  -> ShockwavePcs::combine_rows
       testing_response = alpha^T U
  -> Transcript::challenge_indices (crates/pcs/src/transcript.rs:82)
       derives independent testing and evaluation query sets
  -> unique_query_indices (crates/pcs/src/shockwave.rs:509)
       computes the sorted union of both query sets
  -> ColumnMerkleTree::open_columns (crates/pcs/src/commitment.rs:157)
       opens every unique queried column exactly once

Backend policy:

  • CPU mode uses direct CPU field arithmetic.
  • CUDA small/medium use CPU row combinations because GPU launch overhead is larger than the work.
  • CUDA large/xlarge use ICICLE CUDA matmul against the polynomial matrix retained on the GPU during Commit.

The owned evaluation opening contains:

claimed value y
testing response
evaluation response
one column value and Merkle path for each unique queried column

The two Fiat-Shamir query sets remain independent. If a column appears in both sets, the proof carries that column and authentication path once and both consistency phases reference the same authenticated opening.

Verify Evaluation

Entry point:

pcs.verify_evaluation(&commitment, &point, claimed_value, &proof)

Call flow:

ShockwavePcs::verify_evaluation (crates/pcs/src/shockwave.rs:239)
  -> verify_evaluation_with_backend (crates/pcs/src/shockwave.rs:250)
  -> validate point, commitment, and proof shape
  -> reconstruct the Fiat-Shamir transcript
  -> derive alpha and both expected query sets
  -> recompute the sorted query union
  -> reject unless proof openings exactly match the unique union
  -> ColumnMerkleTree::verify_columns_on_cpu
       (crates/pcs/src/commitment.rs:184)
       hashes opened leaves with CPU Keccak-256
       reconstructs paths with CPU Blake2s
  -> rs_encode_rows (crates/pcs/src/reed_solomon.rs:108)
       encodes testing_response and evaluation_response as one two-row batch
  -> ShockwavePcs::verify_consistency_openings
       (crates/pcs/src/shockwave.rs:363)
       checks testing and evaluation as two independent phases
       each phase reads its columns from the shared unique opening list
  -> inner_product (crates/pcs/src/shockwave.rs:493)
       checks y = <evaluation_response, q_col>

Backend policy:

  • CPU mode uses CPU batch NTT and direct CPU consistency arithmetic.
  • CUDA mode performs the two response encodings with CUDA batch INTT/NTT.
  • CUDA mode checks each query phase with one CUDA matmul; overlapping query columns reuse the same authenticated opening.
  • Merkle path reconstruction and Fiat-Shamir remain on CPU because their messages are small.

The verifier never receives the full polynomial or full encoded matrix.

Performance Optimizations

  • Single ICICLE field representation: the complete project uses icicle_bn254::ScalarField; there are no Arkworks/ICICLE conversions in the protocol path.
  • Batch all RS rows: Commit performs one batched INTT and one batched NTT instead of launching transforms row by row.
  • Keep CUDA encoding on device: zero-padding and both matrix transposes run on CUDA between INTT, NTT, and Merkle hashing.
  • Hash device fields directly: CUDA Merkle leaf hashing reinterprets the canonical ICICLE field buffer as bytes without rebuilding or reuploading a serialized leaf array.
  • Retain the polynomial matrix on CUDA: large/xlarge Prove computes both row combinations with CUDA matrix multiplication.
  • Use a workload threshold: small/medium Prove stays on CPU to avoid GPU launch overhead; large/xlarge uses CUDA.
  • Batch Verify response encoding: testing and evaluation responses are encoded together as one two-row INTT/NTT batch.
  • Batch CUDA consistency equations: every queried column in one proof phase is checked by one matrix multiplication.
  • Deduplicate queried columns: the independent testing and evaluation query sets are merged only for proof storage. Each unique column and Merkle path is carried and verified once, then reused by both consistency phases.
  • Retain Merkle hash layers: opening reads authentication paths without rehashing leaves or invoking one-proof APIs repeatedly.
  • Verify Merkle paths on CPU: the queried batches are small, so CPU Keccak/Blake2s avoids unnecessary GPU transfer and synchronization.
  • Keep Fiat-Shamir on CPU: small transcript hashes avoid CUDA launch cost while preserving identical bytes and challenges.
  • Cache backend and NTT setup: CUDA libraries load once, devices switch without rescanning libraries, and initialized NTT domains are reused.

These optimizations preserve the field, cosets, Reed-Solomon code, Merkle hash definitions, transcript, independent query sets, verification equations, and security assumptions. Query deduplication changes only the proof's storage representation by omitting duplicate copies of an already authenticated column.

Benchmark Parameters

These are development benchmark profiles, not final production security recommendations.

All profiles use:

rho_inv = 2
RS rate rho = 1/2
encoded_cols = 2 * matrix_cols
two independent query sets
Profile ell n Matrix Encoded matrix Queries/phase Commit iters Prove iters Verify iters
small 8 256 16 x 16 16 x 32 8 20 20 100
medium 12 4,096 64 x 64 64 x 128 16 10 10 50
large 16 65,536 256 x 256 256 x 512 32 10 10 10
xlarge 20 1,048,576 1,024 x 1,024 1,024 x 2,048 64 10 10 10

benchmark all runs all four profiles.

Benchmark Results

Measured with release builds. These values are from the second confirming run after retaining query deduplication and restoring independent consistency checks plus direct Rust Merkle verification.

CPU:

CPU: Intel Core i9-14900, 32 logical CPUs
OS: Linux under WSL2
Backend: ICICLE CPU
Profile Commit Evaluation proof Verify Commitment Opening
small 10.918 ms 0.117 ms 3.632 ms 32 B 10,576 B
medium 40.414 ms 0.808 ms 4.539 ms 32 B 70,248 B
large 171.328 ms 7.875 ms 10.668 ms 32 B 559,648 B
xlarge 1,196.924 ms 95.812 ms 56.798 ms 32 B 4,239,696 B

CUDA:

GPU: NVIDIA RTX 2000 Ada Generation, 16 GB
Driver: NVIDIA 581.42
CUDA Toolkit: 12.6
Backend: ICICLE v4.0.0 ubuntu22-cuda122
Profile Commit Evaluation proof Verify Commitment Opening
small 0.881 ms 0.115 ms 0.667 ms 32 B 10,576 B
medium 0.978 ms 0.804 ms 1.445 ms 32 B 70,248 B
large 5.519 ms 5.176 ms 8.111 ms 32 B 559,648 B
xlarge 60.239 ms 30.687 ms 49.068 ms 32 B 4,239,696 B

For xlarge, the complete measured workflow with CUDA has about a 10.1x total speedup. Commit is about 20.7x faster and Prove about 3.3x faster.

Commitment size is the 32-byte root. Opening size includes the claimed value, both response vectors, unique opened columns, indices, and Merkle paths, but excludes external container overhead. The size reduction depends on how many columns overlap between the two deterministic Fiat-Shamir query sets; the large profile has no overlap for this benchmark input.

The CPU measurements use ICICLE's CPU NTT backend and should not be compared as an Arkworks FFT benchmark. The CUDA measurements show the intended acceleration path for the unified ICICLE field and NTT implementation.

Running Commands

Run all correctness tests:

cargo test --workspace

Run formatting and Clippy:

cargo fmt --all -- --check
cargo clippy --workspace --all-targets -- -D warnings

Run an end-to-end demo:

cargo run --release -p cli -- demo small
cargo run --release -p cli -- demo xlarge

Run all four CPU benchmarks:

cargo run --release -p cli -- benchmark all

Run all four CUDA benchmarks:

ICICLE_BACKEND_INSTALL_DIR=/tmp/icicle-v4-cuda/icicle/lib/backend cargo run --release -p cli -- benchmark all --backend cuda --device-id 0

Build the workspace:

cargo build --release --workspace

Current Scope

The implementation includes the complete square-matrix t = 2 Shockwave PCS workflow: ICICLE BN254 arithmetic, coset RS encoding, full polynomial opening, non-interactive evaluation proofs, CPU/CUDA NTT, batched Merkle commitments, CPU Fiat-Shamir, workload-aware CPU/CUDA Prove, CUDA Verify RS/consistency checks, unique-query proof storage, owned Rust proofs, benchmarks, compatibility tests, and negative tests.

The following remain future work:

  • Rectangular matrix proof-size optimization.
  • Merkle multiproofs that also deduplicate shared internal path nodes.
  • Automatic query-count derivation from target soundness bits.
  • A standardized wire encoding for the owned proof structs.
  • Systematic Reed-Solomon encoding.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages