Refactor store mappings from per-validator to per-attestation-data keying by pablodeymo · Pull Request #253 · lambdaclass/ethlambda

pablodeymo · 2026-03-30T17:58:29Z

Motivation

The storage layer used SignatureKey = (validator_id, data_root) as the primary key for both gossip signatures and aggregated payloads. This meant that a single aggregated proof covering 9 validators produced 9 duplicated entries (one per validator) — each storing a clone of the same ~3KB proof. Additionally, reconstructing per-validator attestation maps required a reverse-lookup table (AttestationDataByRoot) to go from data_root → AttestationData.

The leanSpec Python reference uses AttestationData directly as dict keys, groups proofs per attestation message, and uses proof.participants bitfields to determine which validators are covered. This PR aligns ethlambda with the leanSpec architecture.

Description

Three fundamental shifts

1. Key inversion: per-validator → per-attestation-data

BEFORE: N entries for 1 proof covering N validators
  (vid_0, data_root) → proof
  (vid_1, data_root) → proof   ← same proof duplicated!
  (vid_2, data_root) → proof

AFTER: 1 entry containing the proof (participants bitfield tells you who's covered)
  data_root → (AttestationData, [proof])

The PayloadBuffer is redesigned from a flat VecDeque<(SignatureKey, StoredAggregatedPayload)> to a HashMap<H256, PayloadEntry> + VecDeque<H256> structure that groups proofs by attestation data while preserving FIFO eviction order. Each distinct attestation message stores the full AttestationData plus all AggregatedSignatureProofs covering that message.

2. AttestationDataByRoot table eliminated

The old scheme needed AttestationDataByRoot (RocksDB table mapping H256 → AttestationData) because SignatureKey only stored the hash. The new scheme stores AttestationData directly alongside proofs, so the reverse-lookup is no longer needed.

3. Proof selection uses participation bits

BEFORE (build_block):
  sig_key = (validator_id, data_root)
  if aggregated_payloads.contains_key(&sig_key) → include

AFTER (build_block):
  if data_root in aggregated_payloads:
    for proof in payloads[data_root]:
      if proof.participants[validator_id] → include

extract_latest_attestations now iterates proofs' participation bits directly instead of relying on key membership, matching the leanSpec extract_attestations_from_aggregated_payloads method.

Gossip signatures moved to in-memory

Gossip signatures are transient — collected via P2P and consumed at interval 2 aggregation (~every 4 seconds). Persisting them to RocksDB was unnecessary overhead. The new in-memory structure matches the leanSpec:

# leanSpec
gossip_signatures: dict[AttestationData, set[GossipSignatureEntry]]

// ethlambda (new)
gossip_signatures: HashMap<H256, GossipDataEntry>
// where GossipDataEntry = { data: AttestationData, signatures: Vec<GossipSignatureEntry> }

RocksDB tables reduced from 8 to 6

Removed Table	Reason
`GossipSignatures`	Moved to in-memory (transient data)
`AttestationDataByRoot`	No longer needed (data stored alongside proofs)

Types removed

Type	Reason
`SignatureKey = (u64, H256)`	Replaced by `H256` (data_root) keying
`StoredAggregatedPayload`	Proofs stored directly as `AggregatedSignatureProof`
`StoredSignature`	Gossip signatures now in-memory as `GossipSignatureEntry`
`crates/storage/src/types.rs`	Deleted entirely (both types removed)

Trait derives added

AttestationData: added PartialEq, Eq, Hash (needed for use as map key in leanSpec-aligned API)
Checkpoint: added Hash (nested in AttestationData)

Changes by file

`crates/storage/src/store.rs` (core of the refactor)

New types:

PayloadEntry { data: AttestationData, proofs: Vec<AggregatedSignatureProof> } — grouped proof storage
GossipSignatureEntry { validator_id: u64, signature: ValidatorSignature } — individual gossip sig
GossipDataEntry { data: AttestationData, signatures: Vec<GossipSignatureEntry> } — grouped gossip storage
GossipSignatureMap = HashMap<H256, GossipDataEntry> — type alias for the gossip map
GossipSignatureSnapshot — public type alias for the snapshot returned by iter_gossip_signatures

PayloadBuffer redesign:

data: HashMap<H256, PayloadEntry> — grouped proofs by data_root
order: VecDeque<H256> — insertion order for FIFO eviction
push() — appends proof to existing entry or creates new entry (with FIFO eviction)
drain() — flattens entries for promotion to known buffer
extract_latest_attestations() — derives per-validator attestation map from participation bits (replaces the old key-membership + AttestationDataByRoot lookup)

Store API changes:

Removed: insert_attestation_data_by_root, insert_attestation_data_by_root_batch, get_attestation_data_by_root
Removed: extract_latest_attestations(keys: Iterator<Item = SignatureKey>)
Removed: iter_known_aggregated_payloads, iter_known_aggregated_payload_keys, iter_new_aggregated_payloads, iter_new_aggregated_payload_keys
Removed: encode_signature_key, decode_signature_key, count_gossip_signatures, prune_by_slot, prune_attestation_data_by_root
Added: extract_latest_known_attestations(), extract_latest_new_attestations(), extract_latest_all_attestations()
Added: known_aggregated_payloads() — returns HashMap<H256, (AttestationData, Vec<AggregatedSignatureProof>)>
Changed: all insert_*_aggregated_payload* methods now take (data_root, att_data, proof) instead of (SignatureKey, StoredAggregatedPayload)
Changed: insert_gossip_signature now takes AttestationData directly instead of slot
Changed: delete_gossip_signatures now takes &[(u64, H256)] instead of &[SignatureKey]
Changed: iter_gossip_signatures returns GossipSignatureSnapshot (grouped by attestation data)
Changed: prune_gossip_signatures uses HashMap::retain (in-memory) instead of RocksDB iteration
Changed: gossip_signatures_count counts in-memory entries instead of using AtomicUsize

Capacity constants adjusted:

AGGREGATED_PAYLOAD_CAP: 4096 → 512 (now counts distinct messages, not per-validator entries)
NEW_PAYLOAD_CAP: 512 → 64 (same reason)

`crates/blockchain/src/store.rs` (consumer updates)

update_safe_target: replaced key-set merge + extract_latest_attestations with single extract_latest_all_attestations() call
aggregate_committee_signatures: iterates pre-grouped gossip signatures (no more manual grouping by data_root + AttestationDataByRoot lookup); stores one proof per attestation data instead of N per validator
on_gossip_attestation: passes AttestationData to insert_gossip_signature (no more insert_attestation_data_by_root)
on_gossip_aggregated_attestation: stores one proof entry instead of N per-validator entries
on_block_core: stores one proof per attestation in known buffer; counts participating validators for metrics
produce_block_with_signatures: derives attestation map from already-cloned payloads (avoids redundant lock + iteration); uses new known_aggregated_payloads() API
build_block: pre-computes tree_hash_root per attestation outside fixed-point loop; checks proof.participants[validator_id] instead of HashMap::contains_key
select_aggregated_proofs: looks up proofs by data_root with greedy set-cover over all candidates (instead of per-validator lookup)
aggregate_attestations_by_data: now reuses aggregation_bits_from_validator_indices instead of duplicating bitfield construction

`crates/storage/src/api/tables.rs`

Removed GossipSignatures and AttestationDataByRoot variants
ALL_TABLES array: 8 → 6 entries

`crates/storage/src/backend/rocksdb.rs`

Removed column family mappings for the 2 deleted tables

`crates/storage/src/lib.rs`

Removed exports: SignatureKey, StoredAggregatedPayload, StoredSignature
Removed mod types declaration

`crates/common/types/src/attestation.rs`

Added PartialEq, Eq, Hash derives to AttestationData

`crates/common/types/src/checkpoint.rs`

Added Hash derive to Checkpoint

`crates/blockchain/tests/forkchoice_spectests.rs`

Updated 3 call sites to use new extract_latest_known_attestations() / extract_latest_new_attestations() API

Storage efficiency improvement

With 9 validators all attesting to the same data:

Metric	Before	After
Proof entries per attestation	9 (one per validator)	1 (shared)
AttestationData copies	9 (in each SignatureKey lookup) + 1 (AttestationDataByRoot)	1 (stored with proofs)
RocksDB tables	8	6
RocksDB writes per gossip attestation	2 (GossipSignatures + AttestationDataByRoot)	0 (in-memory)

How to Test

make fmt    # ✅ passes
make lint   # ✅ passes (zero warnings)
make test   # ✅ passes (16 storage + 27 forkchoice spec tests)

Note on existing RocksDB data

Nodes upgrading from a previous version will have orphaned gossip_signatures and attestation_data_by_root column families in their RocksDB database. These are inert (never read or written) and can be cleaned up in a future migration, or will be ignored by RocksDB if the column families are not opened. No data loss or startup failure is expected.

…ying Aligns the storage architecture with the leanSpec Python reference implementation. Three fundamental shifts: 1. Key inversion: SignatureKey = (validator_id, data_root) removed. PayloadBuffer now groups proofs by attestation data (via data_root), storing one proof entry that covers all participants instead of N duplicated entries per validator. 2. AttestationDataByRoot table eliminated. AttestationData is stored directly alongside proofs in the in-memory PayloadBuffer and gossip signature map. 3. Proof selection uses participation bits (proof.participants[validator_id]) instead of HashMap key membership. Additional changes: - Gossip signatures moved from RocksDB to in-memory (transient data consumed at interval 2 aggregation) - RocksDB tables reduced from 8 to 6 (removed GossipSignatures and AttestationDataByRoot) - StoredSignature, StoredAggregatedPayload, and SignatureKey types removed - Added Hash derive to AttestationData and Checkpoint for use as map keys

github-actions · 2026-03-30T18:00:04Z

🤖 Kimi Code Review

Overall Assessment: This is a well-structured refactoring that moves from per-validator key storage to content-addressed attestation data grouping, reducing storage overhead and simplifying the fork choice data flow. The in-memory gossip signature buffer is appropriate for transient data.

Issues Found:

1. Missing validation of data_root consistency (Security/Code Correctness)

File: crates/storage/src/store.rs
Lines: 1020-1035 (insert_gossip_signature), 921-935 (insert_new_aggregated_payload),

Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt

github-actions · 2026-03-30T18:01:26Z

🤖 Claude Code Review

Now I have a complete picture of all the changes. Let me write the review.

PR Review: Refactor store mappings from per-validator to per-attestation-data keying

The motivation is sound and the alignment with leanSpec is a real improvement. The core changes are well-structured. A few findings below.

Correctness

extract_latest_all_attestations holds two separate Mutex locks sequentially

crates/storage/src/store.rs (around line 932–955 in the new code):

let mut result = self.known_payloads.lock().unwrap().extract_latest_attestations();
// lock released here ← window opens
for (vid, data) in self.new_payloads.lock().unwrap().extract_latest_attestations() {

Between the two acquisitions, promote_new_aggregated_payloads could run on another thread: an attestation drains from new_payloads and lands in known_payloads after known_payloads was already read but before new_payloads is read — making it appear in neither snapshot. In a single-threaded actor this race can't currently occur, but the non-atomicity is a silent correctness assumption. A brief comment like // Actor-local: promotion only happens from the same tick loop, so non-atomic read is safe would make the intent explicit.

PayloadBuffer::drain() loses insertion order

crates/storage/src/store.rs — the drain method:

fn drain(&mut self) -> Vec<(H256, AttestationData, AggregatedSignatureProof)> {
    self.order.clear();
    self.data.drain()
        .flat_map(|(root, entry)| {
            // HashMap::drain() is unordered

HashMap::drain() yields entries in an arbitrary order. The old VecDeque::drain() was strictly FIFO. When promote_new_aggregated_payloads pushes these into the known buffer, the order VecDeque in the known buffer records the promoted entries in an undefined order — which determines which entries are evicted first if the known buffer later fills up. In practice this rarely matters (known buffer capacity is 512 distinct messages), but the FIFO guarantee advertised in the doc comment no longer holds end-to-end. Either update the doc comment to say "FIFO within each buffer; promotion order is unspecified" or drain in order sequence:

fn drain(&mut self) -> Vec<...> {
    let ordered: Vec<_> = self.order.drain(..)
        .filter_map(|root| self.data.remove(&root).map(|entry| (root, entry)))
        .flat_map(|(root, entry)| entry.proofs.into_iter().map(move |proof| (root, entry.data.clone(), proof)))
        .collect();
    ordered
}

delete_gossip_signatures removes an entry while iterating the outer map

crates/storage/src/store.rs:

for &(vid, data_root) in keys {
    if let Some(entry) = gossip.get_mut(&data_root) {
        entry.signatures.retain(|e| e.validator_id != vid);
        if entry.signatures.is_empty() {
            gossip.remove(&data_root);   // ← safe under NLL, but only if 'entry' borrow ends
        }
    }
}

This compiles under NLL because entry is last used at entry.signatures.is_empty(). However, if the same data_root appears multiple times in keys (e.g. two validators attesting the same data), the second iteration will correctly get None from get_mut after the remove. This is fine semantically, but worth a comment since the double-removal case is silent rather than explicit.

Performance

known_aggregated_payloads() deep-clones all proofs on every call to produce_block_with_signatures

crates/storage/src/store.rs:

buf.data
    .iter()
    .map(|(root, entry)| (*root, (entry.data.clone(), entry.proofs.clone())))
    .collect()

AggregatedSignatureProof is ~3 KB. With 512 entries each holding multiple proofs, this is a large allocation on every block production. The old code had the same shape. It's not a regression, but now that the structure is cleaner, it may be worth wrapping PayloadEntry in Arc so the snapshot is a Arc::clone per entry rather than a deep copy. Not blocking, just flagging.

Minor issues

Redundant bounds check in build_block (crates/blockchain/src/store.rs ~line 340):

vid < p.participants.len() && p.participants.get(vid).unwrap_or(false)

The explicit vid < p.participants.len() guard makes the unwrap_or unreachable. Both are fine; remove one.

Metric semantic change not reflected in name

metrics::update_latest_new_aggregated_payloads(store.new_aggregated_payloads_count()) — new_aggregated_payloads_count now counts distinct attestation messages, not per-validator payload entries (which was the previous meaning). If any Prometheus alert or dashboard was tuned to the old semantics (e.g. "alert if count > 9×N validators"), it will fire differently. The PR description notes this only for storage tables; it's worth calling out in the metric description or a comment.

iter_gossip_signatures returns a Vec, not an iterator

The name iter_* conventionally implies lazy iteration. The return type GossipSignatureSnapshot (alias for Vec<...>) is a snapshot. Renaming to gossip_signature_snapshot() or take_gossip_signatures_snapshot() would align with the naming in the PR description and avoid confusion. Low priority.

What's good

Eliminating AttestationDataByRoot removes an entire consistency hazard: previously a GossipSignatures entry could exist without a matching AttestationDataByRoot entry, causing silent drops in aggregate_committee_signatures via the let Some(data) = store.get_attestation_data_by_root(...) else { continue } guard. That class of bug is gone.
The greedy set-cover in select_aggregated_proofs is cleaner and handles the "candidates present but none cover remaining" case properly with the covered.is_empty() break.
aggregate_committee_signatures is simplified significantly and no longer requires a separate grouping step.
Test coverage is updated correctly throughout.

Automated review by Claude (Anthropic) · sonnet · custom prompt

github-actions · 2026-03-30T18:01:47Z

🤖 Codex Code Review

Existing RocksDB data directories will stop opening after this change. tables.rs removes the GossipSignatures and AttestationDataByRoot column families from ALL_TABLES, while rocksdb.rs still opens the DB with exactly the CF list in ALL_TABLES. RocksDB requires opening all existing CFs, so restarting a node on a pre-PR database is likely to fail immediately at main.rs. This needs an explicit migration or a list_cf-based open path.
The “FIFO” eviction policy is no longer actually FIFO after promotion from new to known. store.rs drains new_payloads through HashMap::drain(), which is unordered, and store.rs re-inserts in that arbitrary order. Since eviction is defined by insertion order in store.rs, once known_payloads fills up you can evict newer attestations before older ones. That can skew fork-choice weight and block construction under load.
The new per-AttestationData grouping introduces an unbounded memory growth path for proofs. In store.rs, every additional proof for the same data_root is appended to entry.proofs, but the capacity limit only applies to distinct roots. A noisy or malicious peer can keep one attestation root hot and grow proofs without ever hitting AGGREGATED_PAYLOAD_CAP/NEW_PAYLOAD_CAP. In consensus ingress code, that is a real DoS risk; this wants proof deduplication and/or a per-root cap.
The gossip-signature metric update moved from O(1) to O(n) under the mutex. store.rs now recomputes the count by scanning the whole map, and it is called on every gossip attestation in store.rs and again after each aggregation cycle in store.rs. Under heavy attestation traffic this becomes avoidable hot-path contention.

I did not run the test suite successfully here: cargo needed to fetch the lean-multisig git dependency, which is blocked in this sandbox.

Automated review by OpenAI Codex · gpt-5.4 · custom prompt

greptile-apps · 2026-03-30T18:08:43Z

Greptile Summary

This PR is a well-scoped storage refactor that eliminates the N-per-validator duplication of aggregated proofs in favour of a per-attestation-data model, aligning the codebase with the leanSpec reference. Two RocksDB tables (GossipSignatures, AttestationDataByRoot) are removed, gossip signatures move to an in-memory HashMap, and PayloadBuffer is redesigned as a HashMap<H256, PayloadEntry> + VecDeque<H256> structure. The changes are logically consistent, all test suites pass, and the refactor correctly handles the key correctness-sensitive pieces (greedy set-cover for proof selection, FIFO eviction bounds, participation-bit checks in build_block).

Key changes:

PayloadBuffer restructured: proofs grouped by data_root instead of flat (validator_id, data_root) entries, reducing storage from O(N validators) to O(1) per attestation message.
GossipSignatureMap moved fully in-memory; prune_gossip_signatures now uses HashMap::retain instead of an RocksDB scan.
extract_latest_attestations now derives the per-validator map directly from proof participation bits rather than requiring a separate AttestationDataByRoot reverse-lookup table.
select_aggregated_proofs improved: the old approach broke out of the greedy loop if target_id had no proof entry, silently excluding other covered validators. The new approach loads all proofs for the data_root at once and does a correct greedy set-cover.
build_block pre-computes tree_hash_root for each attestation outside the fixed-point loop and checks proof.participants[validator_id] instead of a HashMap key lookup.
Three P2 style suggestions are noted: drain() loses insertion order due to HashMap::drain() being unordered; duplicate proofs are not deduplicated on push; the dual (data_root, att_data) parameter pattern introduces a silent invariant that callers must maintain.

Confidence Score: 5/5

Safe to merge — no P0/P1 issues found; all remaining findings are P2 style/improvement suggestions

The refactor is logically correct: FIFO eviction still bounds memory, the greedy set-cover cannot select the same proof twice (proven by the remaining tracking), gossip deduplication is consistent with leanSpec set semantics, and all lock acquisitions are sequential with no deadlock risk. Three P2 suggestions (drain ordering, proof deduplication, redundant data_root parameter) do not affect functional correctness or the tests.

No files require special attention; crates/storage/src/store.rs contains the three P2 suggestions but no blocking issues.

Important Files Changed

Filename	Overview
crates/storage/src/store.rs	Core refactor: PayloadBuffer redesigned to HashMap+VecDeque, gossip signatures moved in-memory, all insertion/extraction APIs updated to data_root-keyed model
crates/blockchain/src/store.rs	Consumer updates: build_block now checks proof.participants bitfields, select_aggregated_proofs uses proper greedy set-cover over all candidates, aggregate_committee_signatures stores one proof per data not per validator
crates/storage/src/types.rs	Deleted entirely: StoredAggregatedPayload and StoredSignature types removed as proofs are now stored directly and gossip is in-memory
crates/storage/src/api/tables.rs	Removed GossipSignatures and AttestationDataByRoot table variants; ALL_TABLES array reduced from 8 to 6
crates/common/types/src/attestation.rs	Added PartialEq, Eq, Hash derives to AttestationData to support use as HashMap key; derives are consistent with each other

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Gossip Attestation\non_gossip_attestation] -->|insert_gossip_signature\ndata_root + att_data + sig| G[(GossipSignatureMap\nin-memory\nHashMap H256 → GossipDataEntry)]

    B[Gossip Aggregated\non_gossip_aggregated_attestation] -->|insert_new_aggregated_payload\ndata_root + att_data + proof| NP[(new_payloads\nPayloadBuffer\nHashMap H256 → PayloadEntry)]

    C[Block Received\non_block_core] -->|insert_known_aggregated_payloads_batch\ndata_root + att_data + proof| KP[(known_payloads\nPayloadBuffer\nHashMap H256 → PayloadEntry)]

    G -->|iter_gossip_signatures snapshot| AGG[aggregate_committee_signatures\nInterval 2]
    AGG -->|1 proof per data_root| NP
    AGG -->|delete processed keys| G

    NP -->|promote_new_aggregated_payloads\nInterval 4| KP

    KP -->|known_aggregated_payloads| PB[produce_block_with_signatures]
    KP -->|extract_latest_known_attestations| PB
    NP -->|extract_latest_new_attestations| US[update_safe_target\nInterval 3]
    KP -->|extract_latest_known_attestations| US

    PB -->|build_block\ncheck proof.participants bitfield| BB[Block with Attestations\n+ AggregatedSignatureProofs]

Comments Outside Diff (2)

crates/storage/src/store.rs, line 688-699 (link)

drain() loses insertion order on promotion

HashMap::drain() iterates in an arbitrary (non-deterministic) order. When promote_new_aggregated_payloads drains new_payloads and then pushes each entry into known_payloads via push_batch, the order in which entries land in known_payloads.order (the FIFO eviction deque) is determined by the HashMap's internal iteration rather than the original insertion sequence. This means eviction from known_payloads might remove entries that were actually inserted after some of the newly promoted ones.

For correctness this is acceptable — the cap still bounds memory — but it is a silent deviation from the true FIFO guarantee described in the doc-comment. If strict FIFO is required across promotions, the order deque could be drained and iterated in order before clearing data:

fn drain(&mut self) -> Vec<(H256, AttestationData, AggregatedSignatureProof)> {
    // Drain in insertion order by following `order`
    self.order
        .drain(..)
        .filter_map(|root| self.data.remove(&root))
        .flat_map(|entry| {
            let data = entry.data;
            entry.proofs.into_iter().map(move |proof| (root, data.clone(), proof))
        })
        .collect()
}

(The above sketch captures the intent; adapt as needed.)

Prompt To Fix With AI

This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 688-699

Comment:
**`drain()` loses insertion order on promotion**

`HashMap::drain()` iterates in an arbitrary (non-deterministic) order. When `promote_new_aggregated_payloads` drains `new_payloads` and then pushes each entry into `known_payloads` via `push_batch`, the order in which entries land in `known_payloads.order` (the FIFO eviction deque) is determined by the HashMap's internal iteration rather than the original insertion sequence. This means eviction from `known_payloads` might remove entries that were actually inserted *after* some of the newly promoted ones.

For correctness this is acceptable — the cap still bounds memory — but it is a silent deviation from the true FIFO guarantee described in the doc-comment. If strict FIFO is required across promotions, the `order` deque could be drained and iterated in order before clearing `data`:

```rust
fn drain(&mut self) -> Vec<(H256, AttestationData, AggregatedSignatureProof)> {
    // Drain in insertion order by following `order`
    self.order
        .drain(..)
        .filter_map(|root| self.data.remove(&root))
        .flat_map(|entry| {
            let data = entry.data;
            entry.proofs.into_iter().map(move |proof| (root, data.clone(), proof))
        })
        .collect()
}
```
(The above sketch captures the intent; adapt as needed.)

How can I resolve this? If you propose a fix, please make it concise.

crates/storage/src/store.rs, line 643-661 (link)

No deduplication of identical proofs on push

PayloadBuffer::push for an already-known data_root unconditionally appends the proof without checking whether the same AggregatedSignatureProof (same participants + proof bytes) was already stored. If the same gossip aggregated attestation is relayed by multiple peers (a common P2P scenario), each call to on_gossip_aggregated_attestation → insert_new_aggregated_payload pushes an identical proof object, growing the proofs vector with duplicates.

The greedy set-cover in select_aggregated_proofs and the has_proof check in build_block are both correct in the presence of duplicates, but unnecessary work is done iterating redundant entries on each block production cycle.

Consider a lightweight guard — checking that no existing proof already covers the same participant set before pushing:

if let Some(entry) = self.data.get_mut(&data_root) {
    // Skip if an identical proof is already stored
    if !entry.proofs.iter().any(|p| p.participants == proof.participants) {
        entry.proofs.push(proof);
    }
} else {
    ...
}

Prompt To Fix With AI

This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 643-661

Comment:
**No deduplication of identical proofs on `push`**

`PayloadBuffer::push` for an already-known `data_root` unconditionally appends the proof without checking whether the same `AggregatedSignatureProof` (same participants + proof bytes) was already stored. If the same gossip aggregated attestation is relayed by multiple peers (a common P2P scenario), each call to `on_gossip_aggregated_attestation` → `insert_new_aggregated_payload` pushes an identical proof object, growing the `proofs` vector with duplicates.

The greedy set-cover in `select_aggregated_proofs` and the `has_proof` check in `build_block` are both correct in the presence of duplicates, but unnecessary work is done iterating redundant entries on each block production cycle.

Consider a lightweight guard — checking that no existing proof already covers the same participant set before pushing:

```rust
if let Some(entry) = self.data.get_mut(&data_root) {
    // Skip if an identical proof is already stored
    if !entry.proofs.iter().any(|p| p.participants == proof.participants) {
        entry.proofs.push(proof);
    }
} else {
    ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 688-699

Comment:
**`drain()` loses insertion order on promotion**

`HashMap::drain()` iterates in an arbitrary (non-deterministic) order. When `promote_new_aggregated_payloads` drains `new_payloads` and then pushes each entry into `known_payloads` via `push_batch`, the order in which entries land in `known_payloads.order` (the FIFO eviction deque) is determined by the HashMap's internal iteration rather than the original insertion sequence. This means eviction from `known_payloads` might remove entries that were actually inserted *after* some of the newly promoted ones.

For correctness this is acceptable — the cap still bounds memory — but it is a silent deviation from the true FIFO guarantee described in the doc-comment. If strict FIFO is required across promotions, the `order` deque could be drained and iterated in order before clearing `data`:

```rust
fn drain(&mut self) -> Vec<(H256, AttestationData, AggregatedSignatureProof)> {
    // Drain in insertion order by following `order`
    self.order
        .drain(..)
        .filter_map(|root| self.data.remove(&root))
        .flat_map(|entry| {
            let data = entry.data;
            entry.proofs.into_iter().map(move |proof| (root, data.clone(), proof))
        })
        .collect()
}
```
(The above sketch captures the intent; adapt as needed.)

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 643-661

Comment:
**No deduplication of identical proofs on `push`**

`PayloadBuffer::push` for an already-known `data_root` unconditionally appends the proof without checking whether the same `AggregatedSignatureProof` (same participants + proof bytes) was already stored. If the same gossip aggregated attestation is relayed by multiple peers (a common P2P scenario), each call to `on_gossip_aggregated_attestation` → `insert_new_aggregated_payload` pushes an identical proof object, growing the `proofs` vector with duplicates.

The greedy set-cover in `select_aggregated_proofs` and the `has_proof` check in `build_block` are both correct in the presence of duplicates, but unnecessary work is done iterating redundant entries on each block production cycle.

Consider a lightweight guard — checking that no existing proof already covers the same participant set before pushing:

```rust
if let Some(entry) = self.data.get_mut(&data_root) {
    // Skip if an identical proof is already stored
    if !entry.proofs.iter().any(|p| p.participants == proof.participants) {
        entry.proofs.push(proof);
    }
} else {
    ...
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/storage/src/store.rs
Line: 993-1017

Comment:
**Redundant `data_root` parameter — silent invariant required by callers**

`insert_gossip_signature` (and the analogous `insert_new_aggregated_payload` / `insert_known_aggregated_payload`) accept both `data_root: H256` and `att_data: AttestationData` as separate parameters, implicitly requiring callers to ensure `data_root == att_data.tree_hash_root()`. All current call sites correctly pre-compute `data_root` for reuse, but the API itself does not enforce this invariant.

A caller passing a mismatched pair would silently store `att_data` under the wrong key, corrupting later lookups. Since the `data_root` is always computable from `att_data`, accepting only `att_data` (and computing the root internally) would eliminate the risk:

```rust
pub fn insert_gossip_signature(
    &mut self,
    att_data: AttestationData,
    validator_id: u64,
    signature: ValidatorSignature,
) {
    let data_root = att_data.tree_hash_root();
    ...
}
```

The same pattern applies to `insert_new_aggregated_payload` and `insert_known_aggregated_payload`. Callers that already hold a pre-computed root could pass it through `PayloadBuffer::push` if the root computation is a measured hot path, but given that `tree_hash_root` is deterministic and cheap for `AttestationData`, the simplification is likely worth it.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "Refactor store mappings from per-validat..." | Re-trigger Greptile}

greptile-apps · 2026-03-30T18:08:53Z

crates/storage/src/store.rs

    /// Stores a gossip signature for later aggregation.
    pub fn insert_gossip_signature(
        &mut self,
        data_root: H256,
-        slot: u64,
+        att_data: AttestationData,
        validator_id: u64,
        signature: ValidatorSignature,
    ) {
-        let key = (validator_id, data_root);
-        let encoded_key = encode_signature_key(&key);
-
-        // Check if key already exists to avoid inflating the counter on upsert
-        let is_new = self
-            .backend
-            .begin_read()
-            .expect("read view")
-            .get(Table::GossipSignatures, &encoded_key)
-            .expect("get")
-            .is_none();
-
-        let stored = StoredSignature::new(slot, signature);
-        let mut batch = self.backend.begin_write().expect("write batch");
-        let entries = vec![(encoded_key, stored.as_ssz_bytes())];
-        batch
-            .put_batch(Table::GossipSignatures, entries)
-            .expect("put signature");
-        batch.commit().expect("commit");
-
-        if is_new {
-            self.gossip_signatures_count.fetch_add(1, Ordering::Relaxed);
+        let mut gossip = self.gossip_signatures.lock().unwrap();
+        let entry = gossip.entry(data_root).or_insert_with(|| GossipDataEntry {
+            data: att_data,
+            signatures: Vec::new(),
+        });
+        // Avoid duplicates for same validator
+        if !entry
+            .signatures
+            .iter()
+            .any(|e| e.validator_id == validator_id)
+        {
+            entry.signatures.push(GossipSignatureEntry {
+                validator_id,
+                signature,
+            });
        }
    }


Redundant data_root parameter — silent invariant required by callers

insert_gossip_signature (and the analogous insert_new_aggregated_payload / insert_known_aggregated_payload) accept both data_root: H256 and att_data: AttestationData as separate parameters, implicitly requiring callers to ensure data_root == att_data.tree_hash_root(). All current call sites correctly pre-compute data_root for reuse, but the API itself does not enforce this invariant.

A caller passing a mismatched pair would silently store att_data under the wrong key, corrupting later lookups. Since the data_root is always computable from att_data, accepting only att_data (and computing the root internally) would eliminate the risk:

pub fn insert_gossip_signature( &mut self, att_data: AttestationData, validator_id: u64, signature: ValidatorSignature, ) { let data_root = att_data.tree_hash_root(); ... }

The same pattern applies to insert_new_aggregated_payload and insert_known_aggregated_payload. Callers that already hold a pre-computed root could pass it through PayloadBuffer::push if the root computation is a measured hot path, but given that tree_hash_root is deterministic and cheap for AttestationData, the simplification is likely worth it.

Prompt To Fix With AI

This is a comment left during a code review. Path: crates/storage/src/store.rs Line: 993-1017 Comment: **Redundant `data_root` parameter — silent invariant required by callers** `insert_gossip_signature` (and the analogous `insert_new_aggregated_payload` / `insert_known_aggregated_payload`) accept both `data_root: H256` and `att_data: AttestationData` as separate parameters, implicitly requiring callers to ensure `data_root == att_data.tree_hash_root()`. All current call sites correctly pre-compute `data_root` for reuse, but the API itself does not enforce this invariant. A caller passing a mismatched pair would silently store `att_data` under the wrong key, corrupting later lookups. Since the `data_root` is always computable from `att_data`, accepting only `att_data` (and computing the root internally) would eliminate the risk: ```rust pub fn insert_gossip_signature( &mut self, att_data: AttestationData, validator_id: u64, signature: ValidatorSignature, ) { let data_root = att_data.tree_hash_root(); ... } ``` The same pattern applies to `insert_new_aggregated_payload` and `insert_known_aggregated_payload`. Callers that already hold a pre-computed root could pass it through `PayloadBuffer::push` if the root computation is a measured hot path, but given that `tree_hash_root` is deterministic and cheap for `AttestationData`, the simplification is likely worth it. How can I resolve this? If you propose a fix, please make it concise.

RocksDB requires all existing column families to be listed when opening a database. Since this branch removed GossipSignatures and AttestationDataByRoot tables, nodes upgrading from a previous version would crash on startup with "You have to open all column families". Open with deprecated CFs included in descriptors, then drop them immediately after opening.

greptile-apps bot reviewed Mar 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor store mappings from per-validator to per-attestation-data keying#253

Refactor store mappings from per-validator to per-attestation-data keying#253
pablodeymo wants to merge 2 commits intomainfrom
store-mapping-refactor

pablodeymo commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

github-actions bot commented Mar 30, 2026

Uh oh!

greptile-apps bot commented Mar 30, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pablodeymo commented Mar 30, 2026

Motivation

Description

Three fundamental shifts

1. Key inversion: per-validator → per-attestation-data

2. AttestationDataByRoot table eliminated

3. Proof selection uses participation bits

Gossip signatures moved to in-memory

RocksDB tables reduced from 8 to 6

Types removed

Trait derives added

Changes by file

crates/storage/src/store.rs (core of the refactor)

crates/blockchain/src/store.rs (consumer updates)

crates/storage/src/api/tables.rs

crates/storage/src/backend/rocksdb.rs

crates/storage/src/lib.rs

crates/common/types/src/attestation.rs

crates/common/types/src/checkpoint.rs

crates/blockchain/tests/forkchoice_spectests.rs

Storage efficiency improvement

How to Test

Note on existing RocksDB data

Uh oh!

github-actions bot commented Mar 30, 2026

🤖 Kimi Code Review

Uh oh!

github-actions bot commented Mar 30, 2026

🤖 Claude Code Review

PR Review: Refactor store mappings from per-validator to per-attestation-data keying

Correctness

Performance

Minor issues

What's good

Uh oh!

github-actions bot commented Mar 30, 2026

🤖 Codex Code Review

Uh oh!

greptile-apps bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`crates/storage/src/store.rs` (core of the refactor)

`crates/blockchain/src/store.rs` (consumer updates)

`crates/storage/src/api/tables.rs`

`crates/storage/src/backend/rocksdb.rs`

`crates/storage/src/lib.rs`

`crates/common/types/src/attestation.rs`

`crates/common/types/src/checkpoint.rs`

`crates/blockchain/tests/forkchoice_spectests.rs`

greptile-apps bot commented Mar 30, 2026 •

edited

Loading