feat: wave-based batch uploads with configurable concurrency#6
feat: wave-based batch uploads with configurable concurrency#6mickvandijke wants to merge 4 commits intomainfrom
Conversation
Replace per-chunk sequential EVM payments with batched wave-based uploads. Chunks are grouped into waves (up to 64), quotes collected concurrently, then paid in a single wallet.pay_for_quotes() call. Stores from wave N pipeline with quote collection for wave N+1. For a 3-chunk (4MB) upload this reduces payment from 3 separate EVM transactions (~1s with nonce contention) to 1 transaction (~10ms). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Uploads of large files to small networks (e.g. local devnets) were failing because the client blasted all chunks to all peers simultaneously, overwhelming QUIC connections and causing cascading disconnects. Three changes: - Only replicate chunks to CLOSE_GROUP_MAJORITY peers instead of all CLOSE_GROUP_SIZE, reducing per-chunk network load - Throttle both quote collection and chunk storage with buffer_unordered(chunk_concurrency) instead of unbounded parallelism - Make chunk_concurrency configurable via ClientConfig and --chunk-concurrency CLI flag, defaulting to half the available CPU threads Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jacderida
left a comment
There was a problem hiding this comment.
Review of wave-based batch uploads
Nice work overall — the wave-based pipelining design is clean and the code reads well. A few concerns before merging:
1. chunk_put_to_close_group now has zero fault tolerance (chunk.rs:73-74)
This is the main concern. Previously, the function sent PUTs to all peers (typically CLOSE_GROUP_SIZE = 5) and returned success as soon as CLOSE_GROUP_MAJORITY (3) responded. This gave tolerance for up to 2 peer failures.
The PR truncates the target list to exactly CLOSE_GROUP_MAJORITY peers:
let target_peers = &peers[..peers.len().min(CLOSE_GROUP_MAJORITY)];Now if any single peer is slow or unavailable, the entire chunk store fails with InsufficientPeers. On a small testnet or under network churn, this could make uploads significantly more fragile.
This also affects the existing chunk_put code path (not just the new batch path), since chunk_put_to_close_group is shared.
Suggestion: Keep sending to all peers but return early once CLOSE_GROUP_MAJORITY confirm — i.e., revert this change. The network cost of 2 extra concurrent PUTs is marginal compared to the resilience gain.
2. Duplicate chunks are assumed stored without verification (batch.rs:222-226)
In batch_upload_chunks, when a content-address collision is found locally:
if seen_addresses.insert(address) {
unique_chunks.push(chunk);
} else {
debug!("Skipping duplicate chunk {}", hex::encode(address));
all_addresses.push(address);
}The duplicate's address is added to the result immediately, but the first copy of that chunk hasn't been uploaded yet — it might fail later. The address gets returned regardless. This is probably fine in practice since the caller just counts addresses.len(), but it could be misleading if the return value is ever used to verify what was actually stored.
3. Wave error semantics: one chunk failure aborts the whole upload
If any single chunk in a wave fails during store_paid_chunks, the ? in the collect propagates the error and discards all successfully stored chunks from that wave. The previous per-chunk buffer_unordered approach had the same fail-fast behavior, so this isn't a regression, but worth noting — partial progress from the current wave is lost.
For a future iteration, it might be worth returning a partial result (stored addresses + errors) so callers can retry only failed chunks.
4. Minor: payment_mode_used is always PaymentMode::Single for batch path
The batch path reports PaymentMode::Single even though it's doing batched EVM transactions. This is technically accurate (not using merkle proofs) but could be confusing when debugging. Consider a PaymentMode::Batch variant, or at least a comment explaining the naming.
5. Minor: doc comment on chunk_put_to_close_group is now stale
The existing doc comment (line 55-58 on main) says "Sends the PUT concurrently to all peers". The PR updates it to say "first CLOSE_GROUP_MAJORITY peers", but per point 1 above, if you revert the truncation change, the original doc should be restored.
Summary: The batch upload design and pipelining are solid. The main blocker is point 1 — reducing fault tolerance in chunk_put_to_close_group affects all upload paths and could cause reliability issues. I'd suggest reverting that change and keeping the rest.
Summary
chunk_concurrency(defaults to half available CPU threads). Prevents overwhelming peer connections on small networksCLOSE_GROUP_MAJORITY(3) peers instead of allCLOSE_GROUP_SIZE(5), reducing per-chunk network load--chunk-concurrency <N>flag to override the auto-detected defaultTest plan
cargo clippy --all-targets --all-features -- -D warningspassescargo fmt --all -- --checkpasses🤖 Generated with Claude Code