Skip to content

Refactor store mappings from per-validator to per-attestation-data keying#253

Open
pablodeymo wants to merge 2 commits intomainfrom
store-mapping-refactor
Open

Refactor store mappings from per-validator to per-attestation-data keying#253
pablodeymo wants to merge 2 commits intomainfrom
store-mapping-refactor

Conversation

@pablodeymo
Copy link
Copy Markdown
Collaborator

Motivation

The storage layer used SignatureKey = (validator_id, data_root) as the primary key for both gossip signatures and aggregated payloads. This meant that a single aggregated proof covering 9 validators produced 9 duplicated entries (one per validator) — each storing a clone of the same ~3KB proof. Additionally, reconstructing per-validator attestation maps required a reverse-lookup table (AttestationDataByRoot) to go from data_root → AttestationData.

The leanSpec Python reference uses AttestationData directly as dict keys, groups proofs per attestation message, and uses proof.participants bitfields to determine which validators are covered. This PR aligns ethlambda with the leanSpec architecture.

Description

Three fundamental shifts

1. Key inversion: per-validator → per-attestation-data

BEFORE: N entries for 1 proof covering N validators
  (vid_0, data_root) → proof
  (vid_1, data_root) → proof   ← same proof duplicated!
  (vid_2, data_root) → proof

AFTER: 1 entry containing the proof (participants bitfield tells you who's covered)
  data_root → (AttestationData, [proof])

The PayloadBuffer is redesigned from a flat VecDeque<(SignatureKey, StoredAggregatedPayload)> to a HashMap<H256, PayloadEntry> + VecDeque<H256> structure that groups proofs by attestation data while preserving FIFO eviction order. Each distinct attestation message stores the full AttestationData plus all AggregatedSignatureProofs covering that message.

2. AttestationDataByRoot table eliminated

The old scheme needed AttestationDataByRoot (RocksDB table mapping H256 → AttestationData) because SignatureKey only stored the hash. The new scheme stores AttestationData directly alongside proofs, so the reverse-lookup is no longer needed.

3. Proof selection uses participation bits

BEFORE (build_block):
  sig_key = (validator_id, data_root)
  if aggregated_payloads.contains_key(&sig_key) → include

AFTER (build_block):
  if data_root in aggregated_payloads:
    for proof in payloads[data_root]:
      if proof.participants[validator_id] → include

extract_latest_attestations now iterates proofs' participation bits directly instead of relying on key membership, matching the leanSpec extract_attestations_from_aggregated_payloads method.

Gossip signatures moved to in-memory

Gossip signatures are transient — collected via P2P and consumed at interval 2 aggregation (~every 4 seconds). Persisting them to RocksDB was unnecessary overhead. The new in-memory structure matches the leanSpec:

# leanSpec
gossip_signatures: dict[AttestationData, set[GossipSignatureEntry]]
// ethlambda (new)
gossip_signatures: HashMap<H256, GossipDataEntry>
// where GossipDataEntry = { data: AttestationData, signatures: Vec<GossipSignatureEntry> }

RocksDB tables reduced from 8 to 6

Removed Table Reason
GossipSignatures Moved to in-memory (transient data)
AttestationDataByRoot No longer needed (data stored alongside proofs)

Types removed

Type Reason
SignatureKey = (u64, H256) Replaced by H256 (data_root) keying
StoredAggregatedPayload Proofs stored directly as AggregatedSignatureProof
StoredSignature Gossip signatures now in-memory as GossipSignatureEntry
crates/storage/src/types.rs Deleted entirely (both types removed)

Trait derives added

  • AttestationData: added PartialEq, Eq, Hash (needed for use as map key in leanSpec-aligned API)
  • Checkpoint: added Hash (nested in AttestationData)

Changes by file

crates/storage/src/store.rs (core of the refactor)

New types:

  • PayloadEntry { data: AttestationData, proofs: Vec<AggregatedSignatureProof> } — grouped proof storage
  • GossipSignatureEntry { validator_id: u64, signature: ValidatorSignature } — individual gossip sig
  • GossipDataEntry { data: AttestationData, signatures: Vec<GossipSignatureEntry> } — grouped gossip storage
  • GossipSignatureMap = HashMap<H256, GossipDataEntry> — type alias for the gossip map
  • GossipSignatureSnapshot — public type alias for the snapshot returned by iter_gossip_signatures

PayloadBuffer redesign:

  • data: HashMap<H256, PayloadEntry> — grouped proofs by data_root
  • order: VecDeque<H256> — insertion order for FIFO eviction
  • push() — appends proof to existing entry or creates new entry (with FIFO eviction)
  • drain() — flattens entries for promotion to known buffer
  • extract_latest_attestations() — derives per-validator attestation map from participation bits (replaces the old key-membership + AttestationDataByRoot lookup)

Store API changes:

  • Removed: insert_attestation_data_by_root, insert_attestation_data_by_root_batch, get_attestation_data_by_root
  • Removed: extract_latest_attestations(keys: Iterator<Item = SignatureKey>)
  • Removed: iter_known_aggregated_payloads, iter_known_aggregated_payload_keys, iter_new_aggregated_payloads, iter_new_aggregated_payload_keys
  • Removed: encode_signature_key, decode_signature_key, count_gossip_signatures, prune_by_slot, prune_attestation_data_by_root
  • Added: extract_latest_known_attestations(), extract_latest_new_attestations(), extract_latest_all_attestations()
  • Added: known_aggregated_payloads() — returns HashMap<H256, (AttestationData, Vec<AggregatedSignatureProof>)>
  • Changed: all insert_*_aggregated_payload* methods now take (data_root, att_data, proof) instead of (SignatureKey, StoredAggregatedPayload)
  • Changed: insert_gossip_signature now takes AttestationData directly instead of slot
  • Changed: delete_gossip_signatures now takes &[(u64, H256)] instead of &[SignatureKey]
  • Changed: iter_gossip_signatures returns GossipSignatureSnapshot (grouped by attestation data)
  • Changed: prune_gossip_signatures uses HashMap::retain (in-memory) instead of RocksDB iteration
  • Changed: gossip_signatures_count counts in-memory entries instead of using AtomicUsize

Capacity constants adjusted:

  • AGGREGATED_PAYLOAD_CAP: 4096 → 512 (now counts distinct messages, not per-validator entries)
  • NEW_PAYLOAD_CAP: 512 → 64 (same reason)

crates/blockchain/src/store.rs (consumer updates)

  • update_safe_target: replaced key-set merge + extract_latest_attestations with single extract_latest_all_attestations() call
  • aggregate_committee_signatures: iterates pre-grouped gossip signatures (no more manual grouping by data_root + AttestationDataByRoot lookup); stores one proof per attestation data instead of N per validator
  • on_gossip_attestation: passes AttestationData to insert_gossip_signature (no more insert_attestation_data_by_root)
  • on_gossip_aggregated_attestation: stores one proof entry instead of N per-validator entries
  • on_block_core: stores one proof per attestation in known buffer; counts participating validators for metrics
  • produce_block_with_signatures: derives attestation map from already-cloned payloads (avoids redundant lock + iteration); uses new known_aggregated_payloads() API
  • build_block: pre-computes tree_hash_root per attestation outside fixed-point loop; checks proof.participants[validator_id] instead of HashMap::contains_key
  • select_aggregated_proofs: looks up proofs by data_root with greedy set-cover over all candidates (instead of per-validator lookup)
  • aggregate_attestations_by_data: now reuses aggregation_bits_from_validator_indices instead of duplicating bitfield construction

crates/storage/src/api/tables.rs

  • Removed GossipSignatures and AttestationDataByRoot variants
  • ALL_TABLES array: 8 → 6 entries

crates/storage/src/backend/rocksdb.rs

  • Removed column family mappings for the 2 deleted tables

crates/storage/src/lib.rs

  • Removed exports: SignatureKey, StoredAggregatedPayload, StoredSignature
  • Removed mod types declaration

crates/common/types/src/attestation.rs

  • Added PartialEq, Eq, Hash derives to AttestationData

crates/common/types/src/checkpoint.rs

  • Added Hash derive to Checkpoint

crates/blockchain/tests/forkchoice_spectests.rs

  • Updated 3 call sites to use new extract_latest_known_attestations() / extract_latest_new_attestations() API

Storage efficiency improvement

With 9 validators all attesting to the same data:

Metric Before After
Proof entries per attestation 9 (one per validator) 1 (shared)
AttestationData copies 9 (in each SignatureKey lookup) + 1 (AttestationDataByRoot) 1 (stored with proofs)
RocksDB tables 8 6
RocksDB writes per gossip attestation 2 (GossipSignatures + AttestationDataByRoot) 0 (in-memory)

How to Test

make fmt    # ✅ passes
make lint   # ✅ passes (zero warnings)
make test   # ✅ passes (16 storage + 27 forkchoice spec tests)

Note on existing RocksDB data

Nodes upgrading from a previous version will have orphaned gossip_signatures and attestation_data_by_root column families in their RocksDB database. These are inert (never read or written) and can be cleaned up in a future migration, or will be ignored by RocksDB if the column families are not opened. No data loss or startup failure is expected.

…ying

Aligns the storage architecture with the leanSpec Python reference implementation.
Three fundamental shifts:

1. Key inversion: SignatureKey = (validator_id, data_root) removed. PayloadBuffer
   now groups proofs by attestation data (via data_root), storing one proof entry
   that covers all participants instead of N duplicated entries per validator.

2. AttestationDataByRoot table eliminated. AttestationData is stored directly
   alongside proofs in the in-memory PayloadBuffer and gossip signature map.

3. Proof selection uses participation bits (proof.participants[validator_id])
   instead of HashMap key membership.

Additional changes:
- Gossip signatures moved from RocksDB to in-memory (transient data consumed
  at interval 2 aggregation)
- RocksDB tables reduced from 8 to 6 (removed GossipSignatures and
  AttestationDataByRoot)
- StoredSignature, StoredAggregatedPayload, and SignatureKey types removed
- Added Hash derive to AttestationData and Checkpoint for use as map keys
@github-actions
Copy link
Copy Markdown

🤖 Kimi Code Review

Overall Assessment: This is a well-structured refactoring that moves from per-validator key storage to content-addressed attestation data grouping, reducing storage overhead and simplifying the fork choice data flow. The in-memory gossip signature buffer is appropriate for transient data.

Issues Found:

1. Missing validation of data_root consistency (Security/Code Correctness)

  • File: crates/storage/src/store.rs
  • Lines: 1020-1035 (insert_gossip_signature), 921-935 (insert_new_aggregated_payload),

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Claude Code Review

Now I have a complete picture of all the changes. Let me write the review.


PR Review: Refactor store mappings from per-validator to per-attestation-data keying

The motivation is sound and the alignment with leanSpec is a real improvement. The core changes are well-structured. A few findings below.


Correctness

extract_latest_all_attestations holds two separate Mutex locks sequentially

crates/storage/src/store.rs (around line 932–955 in the new code):

let mut result = self.known_payloads.lock().unwrap().extract_latest_attestations();
// lock released here ← window opens
for (vid, data) in self.new_payloads.lock().unwrap().extract_latest_attestations() {

Between the two acquisitions, promote_new_aggregated_payloads could run on another thread: an attestation drains from new_payloads and lands in known_payloads after known_payloads was already read but before new_payloads is read — making it appear in neither snapshot. In a single-threaded actor this race can't currently occur, but the non-atomicity is a silent correctness assumption. A brief comment like // Actor-local: promotion only happens from the same tick loop, so non-atomic read is safe would make the intent explicit.


PayloadBuffer::drain() loses insertion order

crates/storage/src/store.rs — the drain method:

fn drain(&mut self) -> Vec<(H256, AttestationData, AggregatedSignatureProof)> {
    self.order.clear();
    self.data.drain()
        .flat_map(|(root, entry)| {
            // HashMap::drain() is unordered

HashMap::drain() yields entries in an arbitrary order. The old VecDeque::drain() was strictly FIFO. When promote_new_aggregated_payloads pushes these into the known buffer, the order VecDeque in the known buffer records the promoted entries in an undefined order — which determines which entries are evicted first if the known buffer later fills up. In practice this rarely matters (known buffer capacity is 512 distinct messages), but the FIFO guarantee advertised in the doc comment no longer holds end-to-end. Either update the doc comment to say "FIFO within each buffer; promotion order is unspecified" or drain in order sequence:

fn drain(&mut self) -> Vec<...> {
    let ordered: Vec<_> = self.order.drain(..)
        .filter_map(|root| self.data.remove(&root).map(|entry| (root, entry)))
        .flat_map(|(root, entry)| entry.proofs.into_iter().map(move |proof| (root, entry.data.clone(), proof)))
        .collect();
    ordered
}

delete_gossip_signatures removes an entry while iterating the outer map

crates/storage/src/store.rs:

for &(vid, data_root) in keys {
    if let Some(entry) = gossip.get_mut(&data_root) {
        entry.signatures.retain(|e| e.validator_id != vid);
        if entry.signatures.is_empty() {
            gossip.remove(&data_root);   // ← safe under NLL, but only if 'entry' borrow ends
        }
    }
}

This compiles under NLL because entry is last used at entry.signatures.is_empty(). However, if the same data_root appears multiple times in keys (e.g. two validators attesting the same data), the second iteration will correctly get None from get_mut after the remove. This is fine semantically, but worth a comment since the double-removal case is silent rather than explicit.


Performance

known_aggregated_payloads() deep-clones all proofs on every call to produce_block_with_signatures

crates/storage/src/store.rs:

buf.data
    .iter()
    .map(|(root, entry)| (*root, (entry.data.clone(), entry.proofs.clone())))
    .collect()

AggregatedSignatureProof is ~3 KB. With 512 entries each holding multiple proofs, this is a large allocation on every block production. The old code had the same shape. It's not a regression, but now that the structure is cleaner, it may be worth wrapping PayloadEntry in Arc so the snapshot is a Arc::clone per entry rather than a deep copy. Not blocking, just flagging.


Minor issues

Redundant bounds check in build_block (crates/blockchain/src/store.rs ~line 340):

vid < p.participants.len() && p.participants.get(vid).unwrap_or(false)

The explicit vid < p.participants.len() guard makes the unwrap_or unreachable. Both are fine; remove one.


Metric semantic change not reflected in name

metrics::update_latest_new_aggregated_payloads(store.new_aggregated_payloads_count())new_aggregated_payloads_count now counts distinct attestation messages, not per-validator payload entries (which was the previous meaning). If any Prometheus alert or dashboard was tuned to the old semantics (e.g. "alert if count > 9×N validators"), it will fire differently. The PR description notes this only for storage tables; it's worth calling out in the metric description or a comment.


iter_gossip_signatures returns a Vec, not an iterator

The name iter_* conventionally implies lazy iteration. The return type GossipSignatureSnapshot (alias for Vec<...>) is a snapshot. Renaming to gossip_signature_snapshot() or take_gossip_signatures_snapshot() would align with the naming in the PR description and avoid confusion. Low priority.


What's good

  • Eliminating AttestationDataByRoot removes an entire consistency hazard: previously a GossipSignatures entry could exist without a matching AttestationDataByRoot entry, causing silent drops in aggregate_committee_signatures via the let Some(data) = store.get_attestation_data_by_root(...) else { continue } guard. That class of bug is gone.
  • The greedy set-cover in select_aggregated_proofs is cleaner and handles the "candidates present but none cover remaining" case properly with the covered.is_empty() break.
  • aggregate_committee_signatures is simplified significantly and no longer requires a separate grouping step.
  • Test coverage is updated correctly throughout.

Automated review by Claude (Anthropic) · sonnet · custom prompt

@github-actions
Copy link
Copy Markdown

🤖 Codex Code Review

  1. Existing RocksDB data directories will stop opening after this change. tables.rs removes the GossipSignatures and AttestationDataByRoot column families from ALL_TABLES, while rocksdb.rs still opens the DB with exactly the CF list in ALL_TABLES. RocksDB requires opening all existing CFs, so restarting a node on a pre-PR database is likely to fail immediately at main.rs. This needs an explicit migration or a list_cf-based open path.

  2. The “FIFO” eviction policy is no longer actually FIFO after promotion from new to known. store.rs drains new_payloads through HashMap::drain(), which is unordered, and store.rs re-inserts in that arbitrary order. Since eviction is defined by insertion order in store.rs, once known_payloads fills up you can evict newer attestations before older ones. That can skew fork-choice weight and block construction under load.

  3. The new per-AttestationData grouping introduces an unbounded memory growth path for proofs. In store.rs, every additional proof for the same data_root is appended to entry.proofs, but the capacity limit only applies to distinct roots. A noisy or malicious peer can keep one attestation root hot and grow proofs without ever hitting AGGREGATED_PAYLOAD_CAP/NEW_PAYLOAD_CAP. In consensus ingress code, that is a real DoS risk; this wants proof deduplication and/or a per-root cap.

  4. The gossip-signature metric update moved from O(1) to O(n) under the mutex. store.rs now recomputes the count by scanning the whole map, and it is called on every gossip attestation in store.rs and again after each aggregation cycle in store.rs. Under heavy attestation traffic this becomes avoidable hot-path contention.

I did not run the test suite successfully here: cargo needed to fetch the lean-multisig git dependency, which is blocked in this sandbox.


Automated review by OpenAI Codex · gpt-5.4 · custom prompt

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR is a well-scoped storage refactor that eliminates the N-per-validator duplication of aggregated proofs in favour of a per-attestation-data model, aligning the codebase with the leanSpec reference. Two RocksDB tables (GossipSignatures, AttestationDataByRoot) are removed, gossip signatures move to an in-memory HashMap, and PayloadBuffer is redesigned as a HashMap<H256, PayloadEntry> + VecDeque<H256> structure. The changes are logically consistent, all test suites pass, and the refactor correctly handles the key correctness-sensitive pieces (greedy set-cover for proof selection, FIFO eviction bounds, participation-bit checks in build_block).

Key changes:

  • PayloadBuffer restructured: proofs grouped by data_root instead of flat (validator_id, data_root) entries, reducing storage from O(N validators) to O(1) per attestation message.
  • GossipSignatureMap moved fully in-memory; prune_gossip_signatures now uses HashMap::retain instead of an RocksDB scan.
  • extract_latest_attestations now derives the per-validator map directly from proof participation bits rather than requiring a separate AttestationDataByRoot reverse-lookup table.
  • select_aggregated_proofs improved: the old approach broke out of the greedy loop if target_id had no proof entry, silently excluding other covered validators. The new approach loads all proofs for the data_root at once and does a correct greedy set-cover.
  • build_block pre-computes tree_hash_root for each attestation outside the fixed-point loop and checks proof.participants[validator_id] instead of a HashMap key lookup.
  • Three P2 style suggestions are noted: drain() loses insertion order due to HashMap::drain() being unordered; duplicate proofs are not deduplicated on push; the dual (data_root, att_data) parameter pattern introduces a silent invariant that callers must maintain.

Confidence Score: 5/5

Safe to merge — no P0/P1 issues found; all remaining findings are P2 style/improvement suggestions

The refactor is logically correct: FIFO eviction still bounds memory, the greedy set-cover cannot select the same proof twice (proven by the remaining tracking), gossip deduplication is consistent with leanSpec set semantics, and all lock acquisitions are sequential with no deadlock risk. Three P2 suggestions (drain ordering, proof deduplication, redundant data_root parameter) do not affect functional correctness or the tests.

No files require special attention; crates/storage/src/store.rs contains the three P2 suggestions but no blocking issues.

Important Files Changed

Filename Overview
crates/storage/src/store.rs Core refactor: PayloadBuffer redesigned to HashMap+VecDeque, gossip signatures moved in-memory, all insertion/extraction APIs updated to data_root-keyed model
crates/blockchain/src/store.rs Consumer updates: build_block now checks proof.participants bitfields, select_aggregated_proofs uses proper greedy set-cover over all candidates, aggregate_committee_signatures stores one proof per data not per validator
crates/storage/src/types.rs Deleted entirely: StoredAggregatedPayload and StoredSignature types removed as proofs are now stored directly and gossip is in-memory
crates/storage/src/api/tables.rs Removed GossipSignatures and AttestationDataByRoot table variants; ALL_TABLES array reduced from 8 to 6
crates/common/types/src/attestation.rs Added PartialEq, Eq, Hash derives to AttestationData to support use as HashMap key; derives are consistent with each other

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Gossip Attestation\non_gossip_attestation] -->|insert_gossip_signature\ndata_root + att_data + sig| G[(GossipSignatureMap\nin-memory\nHashMap H256 → GossipDataEntry)]

    B[Gossip Aggregated\non_gossip_aggregated_attestation] -->|insert_new_aggregated_payload\ndata_root + att_data + proof| NP[(new_payloads\nPayloadBuffer\nHashMap H256 → PayloadEntry)]

    C[Block Received\non_block_core] -->|insert_known_aggregated_payloads_batch\ndata_root + att_data + proof| KP[(known_payloads\nPayloadBuffer\nHashMap H256 → PayloadEntry)]

    G -->|iter_gossip_signatures snapshot| AGG[aggregate_committee_signatures\nInterval 2]
    AGG -->|1 proof per data_root| NP
    AGG -->|delete processed keys| G

    NP -->|promote_new_aggregated_payloads\nInterval 4| KP

    KP -->|known_aggregated_payloads| PB[produce_block_with_signatures]
    KP -->|extract_latest_known_attestations| PB
    NP -->|extract_latest_new_attestations| US[update_safe_target\nInterval 3]
    KP -->|extract_latest_known_attestations| US

    PB -->|build_block\ncheck proof.participants bitfield| BB[Block with Attestations\n+ AggregatedSignatureProofs]
Loading

Comments Outside Diff (2)

  1. crates/storage/src/store.rs, line 688-699 (link)

    P2 drain() loses insertion order on promotion

    HashMap::drain() iterates in an arbitrary (non-deterministic) order. When promote_new_aggregated_payloads drains new_payloads and then pushes each entry into known_payloads via push_batch, the order in which entries land in known_payloads.order (the FIFO eviction deque) is determined by the HashMap's internal iteration rather than the original insertion sequence. This means eviction from known_payloads might remove entries that were actually inserted after some of the newly promoted ones.

    For correctness this is acceptable — the cap still bounds memory — but it is a silent deviation from the true FIFO guarantee described in the doc-comment. If strict FIFO is required across promotions, the order deque could be drained and iterated in order before clearing data:

    fn drain(&mut self) -> Vec<(H256, AttestationData, AggregatedSignatureProof)> {
        // Drain in insertion order by following `order`
        self.order
            .drain(..)
            .filter_map(|root| self.data.remove(&root))
            .flat_map(|entry| {
                let data = entry.data;
                entry.proofs.into_iter().map(move |proof| (root, data.clone(), proof))
            })
            .collect()
    }

    (The above sketch captures the intent; adapt as needed.)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/storage/src/store.rs
    Line: 688-699
    
    Comment:
    **`drain()` loses insertion order on promotion**
    
    `HashMap::drain()` iterates in an arbitrary (non-deterministic) order. When `promote_new_aggregated_payloads` drains `new_payloads` and then pushes each entry into `known_payloads` via `push_batch`, the order in which entries land in `known_payloads.order` (the FIFO eviction deque) is determined by the HashMap's internal iteration rather than the original insertion sequence. This means eviction from `known_payloads` might remove entries that were actually inserted *after* some of the newly promoted ones.
    
    For correctness this is acceptable — the cap still bounds memory — but it is a silent deviation from the true FIFO guarantee described in the doc-comment. If strict FIFO is required across promotions, the `order` deque could be drained and iterated in order before clearing `data`:
    
    ```rust
    fn drain(&mut self) -> Vec<(H256, AttestationData, AggregatedSignatureProof)> {
        // Drain in insertion order by following `order`
        self.order
            .drain(..)
            .filter_map(|root| self.data.remove(&root))
            .flat_map(|entry| {
                let data = entry.data;
                entry.proofs.into_iter().map(move |proof| (root, data.clone(), proof))
            })
            .collect()
    }
    ```
    (The above sketch captures the intent; adapt as needed.)
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. crates/storage/src/store.rs, line 643-661 (link)

    P2 No deduplication of identical proofs on push

    PayloadBuffer::push for an already-known data_root unconditionally appends the proof without checking whether the same AggregatedSignatureProof (same participants + proof bytes) was already stored. If the same gossip aggregated attestation is relayed by multiple peers (a common P2P scenario), each call to on_gossip_aggregated_attestationinsert_new_aggregated_payload pushes an identical proof object, growing the proofs vector with duplicates.

    The greedy set-cover in select_aggregated_proofs and the has_proof check in build_block are both correct in the presence of duplicates, but unnecessary work is done iterating redundant entries on each block production cycle.

    Consider a lightweight guard — checking that no existing proof already covers the same participant set before pushing:

    if let Some(entry) = self.data.get_mut(&data_root) {
        // Skip if an identical proof is already stored
        if !entry.proofs.iter().any(|p| p.participants == proof.participants) {
            entry.proofs.push(proof);
        }
    } else {
        ...
    }
    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/storage/src/store.rs
    Line: 643-661
    
    Comment:
    **No deduplication of identical proofs on `push`**
    
    `PayloadBuffer::push` for an already-known `data_root` unconditionally appends the proof without checking whether the same `AggregatedSignatureProof` (same participants + proof bytes) was already stored. If the same gossip aggregated attestation is relayed by multiple peers (a common P2P scenario), each call to `on_gossip_aggregated_attestation``insert_new_aggregated_payload` pushes an identical proof object, growing the `proofs` vector with duplicates.
    
    The greedy set-cover in `select_aggregated_proofs` and the `has_proof` check in `build_block` are both correct in the presence of duplicates, but unnecessary work is done iterating redundant entries on each block production cycle.
    
    Consider a lightweight guard — checking that no existing proof already covers the same participant set before pushing:
    
    ```rust
    if let Some(entry) = self.data.get_mut(&data_root) {
        // Skip if an identical proof is already stored
        if !entry.proofs.iter().any(|p| p.participants == proof.participants) {
            entry.proofs.push(proof);
        }
    } else {
        ...
    }
    ```
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 688-699

Comment:
**`drain()` loses insertion order on promotion**

`HashMap::drain()` iterates in an arbitrary (non-deterministic) order. When `promote_new_aggregated_payloads` drains `new_payloads` and then pushes each entry into `known_payloads` via `push_batch`, the order in which entries land in `known_payloads.order` (the FIFO eviction deque) is determined by the HashMap's internal iteration rather than the original insertion sequence. This means eviction from `known_payloads` might remove entries that were actually inserted *after* some of the newly promoted ones.

For correctness this is acceptable — the cap still bounds memory — but it is a silent deviation from the true FIFO guarantee described in the doc-comment. If strict FIFO is required across promotions, the `order` deque could be drained and iterated in order before clearing `data`:

```rust
fn drain(&mut self) -> Vec<(H256, AttestationData, AggregatedSignatureProof)> {
    // Drain in insertion order by following `order`
    self.order
        .drain(..)
        .filter_map(|root| self.data.remove(&root))
        .flat_map(|entry| {
            let data = entry.data;
            entry.proofs.into_iter().map(move |proof| (root, data.clone(), proof))
        })
        .collect()
}
```
(The above sketch captures the intent; adapt as needed.)

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 643-661

Comment:
**No deduplication of identical proofs on `push`**

`PayloadBuffer::push` for an already-known `data_root` unconditionally appends the proof without checking whether the same `AggregatedSignatureProof` (same participants + proof bytes) was already stored. If the same gossip aggregated attestation is relayed by multiple peers (a common P2P scenario), each call to `on_gossip_aggregated_attestation``insert_new_aggregated_payload` pushes an identical proof object, growing the `proofs` vector with duplicates.

The greedy set-cover in `select_aggregated_proofs` and the `has_proof` check in `build_block` are both correct in the presence of duplicates, but unnecessary work is done iterating redundant entries on each block production cycle.

Consider a lightweight guard — checking that no existing proof already covers the same participant set before pushing:

```rust
if let Some(entry) = self.data.get_mut(&data_root) {
    // Skip if an identical proof is already stored
    if !entry.proofs.iter().any(|p| p.participants == proof.participants) {
        entry.proofs.push(proof);
    }
} else {
    ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 993-1017

Comment:
**Redundant `data_root` parameter — silent invariant required by callers**

`insert_gossip_signature` (and the analogous `insert_new_aggregated_payload` / `insert_known_aggregated_payload`) accept both `data_root: H256` and `att_data: AttestationData` as separate parameters, implicitly requiring callers to ensure `data_root == att_data.tree_hash_root()`. All current call sites correctly pre-compute `data_root` for reuse, but the API itself does not enforce this invariant.

A caller passing a mismatched pair would silently store `att_data` under the wrong key, corrupting later lookups. Since the `data_root` is always computable from `att_data`, accepting only `att_data` (and computing the root internally) would eliminate the risk:

```rust
pub fn insert_gossip_signature(
    &mut self,
    att_data: AttestationData,
    validator_id: u64,
    signature: ValidatorSignature,
) {
    let data_root = att_data.tree_hash_root();
    ...
}
```

The same pattern applies to `insert_new_aggregated_payload` and `insert_known_aggregated_payload`. Callers that already hold a pre-computed root could pass it through `PayloadBuffer::push` if the root computation is a measured hot path, but given that `tree_hash_root` is deterministic and cheap for `AttestationData`, the simplification is likely worth it.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "Refactor store mappings from per-validat..." | Re-trigger Greptile

Comment on lines 993 to 1017
/// Stores a gossip signature for later aggregation.
pub fn insert_gossip_signature(
&mut self,
data_root: H256,
slot: u64,
att_data: AttestationData,
validator_id: u64,
signature: ValidatorSignature,
) {
let key = (validator_id, data_root);
let encoded_key = encode_signature_key(&key);

// Check if key already exists to avoid inflating the counter on upsert
let is_new = self
.backend
.begin_read()
.expect("read view")
.get(Table::GossipSignatures, &encoded_key)
.expect("get")
.is_none();

let stored = StoredSignature::new(slot, signature);
let mut batch = self.backend.begin_write().expect("write batch");
let entries = vec![(encoded_key, stored.as_ssz_bytes())];
batch
.put_batch(Table::GossipSignatures, entries)
.expect("put signature");
batch.commit().expect("commit");

if is_new {
self.gossip_signatures_count.fetch_add(1, Ordering::Relaxed);
let mut gossip = self.gossip_signatures.lock().unwrap();
let entry = gossip.entry(data_root).or_insert_with(|| GossipDataEntry {
data: att_data,
signatures: Vec::new(),
});
// Avoid duplicates for same validator
if !entry
.signatures
.iter()
.any(|e| e.validator_id == validator_id)
{
entry.signatures.push(GossipSignatureEntry {
validator_id,
signature,
});
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Redundant data_root parameter — silent invariant required by callers

insert_gossip_signature (and the analogous insert_new_aggregated_payload / insert_known_aggregated_payload) accept both data_root: H256 and att_data: AttestationData as separate parameters, implicitly requiring callers to ensure data_root == att_data.tree_hash_root(). All current call sites correctly pre-compute data_root for reuse, but the API itself does not enforce this invariant.

A caller passing a mismatched pair would silently store att_data under the wrong key, corrupting later lookups. Since the data_root is always computable from att_data, accepting only att_data (and computing the root internally) would eliminate the risk:

pub fn insert_gossip_signature(
    &mut self,
    att_data: AttestationData,
    validator_id: u64,
    signature: ValidatorSignature,
) {
    let data_root = att_data.tree_hash_root();
    ...
}

The same pattern applies to insert_new_aggregated_payload and insert_known_aggregated_payload. Callers that already hold a pre-computed root could pass it through PayloadBuffer::push if the root computation is a measured hot path, but given that tree_hash_root is deterministic and cheap for AttestationData, the simplification is likely worth it.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 993-1017

Comment:
**Redundant `data_root` parameter — silent invariant required by callers**

`insert_gossip_signature` (and the analogous `insert_new_aggregated_payload` / `insert_known_aggregated_payload`) accept both `data_root: H256` and `att_data: AttestationData` as separate parameters, implicitly requiring callers to ensure `data_root == att_data.tree_hash_root()`. All current call sites correctly pre-compute `data_root` for reuse, but the API itself does not enforce this invariant.

A caller passing a mismatched pair would silently store `att_data` under the wrong key, corrupting later lookups. Since the `data_root` is always computable from `att_data`, accepting only `att_data` (and computing the root internally) would eliminate the risk:

```rust
pub fn insert_gossip_signature(
    &mut self,
    att_data: AttestationData,
    validator_id: u64,
    signature: ValidatorSignature,
) {
    let data_root = att_data.tree_hash_root();
    ...
}
```

The same pattern applies to `insert_new_aggregated_payload` and `insert_known_aggregated_payload`. Callers that already hold a pre-computed root could pass it through `PayloadBuffer::push` if the root computation is a measured hot path, but given that `tree_hash_root` is deterministic and cheap for `AttestationData`, the simplification is likely worth it.

How can I resolve this? If you propose a fix, please make it concise.

RocksDB requires all existing column families to be listed when opening
a database. Since this branch removed GossipSignatures and
AttestationDataByRoot tables, nodes upgrading from a previous version
would crash on startup with "You have to open all column families".

Open with deprecated CFs included in descriptors, then drop them
immediately after opening.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant