fix(consensus): accept justified peer block on #1e by default (no env)#801
Conversation
Replaces the per-node SENTRIX_OBSERVER_TOLERANT_STATE_ROOT env gate (#795) with a uniform, default behavior for ALL nodes: on a #1e state_root mismatch for a Peer block, ACCEPT it (stamp the proposer's root) iff the block carries a justification — which means it already passed the 2/3 justification gate in add_block_impl before pass-2 (an absent/insufficient justification is rejected there). A no-justification Peer block still hits the strict #1e reject. Why: consensus is on block_hash (2/3 justification), not state_root. The chain's state-commitment is imperfect (recurring state_root), so a node that falls behind and catches up via sync recomputes a different root and the strict #1e REJECTED the canonical block → the validator got stuck syncing → chain stalled until halt-all (val1, 2026-06-06). The env gate made only the fullnode tolerant, so validators kept stalling; making it the default fixes the stuck-sync at the root and removes the val-vs-fullnode divergence. Safety: this only changes the accept/reject decision on already-canonical (justification-gated) blocks; the stamped root is identical either way, so no computed-state change and no fork gate needed. The accept inherits the chain's existing justification trust level (the add_block_impl gate) — no weaker. NOTE: cryptographic per-signature justification verification (STRICT_JUSTIFICATION_HEIGHT) is a separate, currently-broken path (recover_signer disagrees with the producer's signing payload — validated 2026-06-06: enabling it rejects legitimate blocks); fixing that to enable strict verification fleet-wide is a tracked follow-up. Tests: justified Peer block + tampered root past fork height → ACCEPTED, root stamped; the no-justification counterpart → REJECTED (the updated C-03 rollback test). Both use the pad-past-STATE_ROOT_FORK_HEIGHT harness. sentrix-core 264 passed; clippy --all-targets -D warnings clean.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
Warning Review limit reached
More reviews will be available in 44 minutes and 54 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThe PR updates the state-root mismatch handling in Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/sentrix-core/src/block_executor.rs (2)
2808-2829:⚠️ Potential issue | 🟠 Major | ⚡ Quick winThis regression test no longer exercises the staking-state leak it describes.
Setting
block.justification = Nonepreventsapply_reward_bookkeeping_for_latest_block()from mutatingpending_rewardsat all — it returns early when the latest block has no justification (Lines 1888-1890). The#1erejection still happens, butpending_before == pending_afternow passes even if the C-03 rollback ofstake_registryis broken. Please keep this test driving a post-bundle Pass-2 failure instead of a no-justification fast path so it continues to pin the real RCA invariant.As per coding guidelines, test changes under
crates/sentrix-core/src/block_executor**must pin a specific invariant from a real bug RCA.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/sentrix-core/src/block_executor.rs` around lines 2808 - 2829, The test currently clears block.justification so apply_reward_bookkeeping_for_latest_block() returns early and never mutates pending_rewards; restore a justification so Pass-2 reward bookkeeping runs before you tamper the state_root. Concretely: revert/remove the line setting block.justification = None (or set it to the original Some(justification) used earlier), keep the tampered block.state_root and block.hash changes, then call bc.add_block_from_peer(...) to trigger the `#1e` rejection and verify pending_rewards rolls back; references: block.justification, block.state_root, block.calculate_hash(), add_block_from_peer, apply_reward_bookkeeping_for_latest_block, and stake_registry.validators.get("v1").pending_rewards.Source: Coding guidelines
1762-1779:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve
SENTRIX_STATE_FINGERPRINTon this new early-return path.This return still bypasses the fingerprint emit at Lines 1834-1841. With
#1eacceptance now being the default, the exact blocks we need for cross-node drift forensics stop producing the load-bearing fingerprint on validators too. Mirroremit_state_fingerprint(self, self.height())before returning so this path keeps the existing debugging contract.As per coding guidelines,
SENTRIX_STATE_FINGERPRINTis load-bearing for debugging + chain replay, and removing it silently is a regression.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@crates/sentrix-core/src/block_executor.rs` around lines 1762 - 1779, This early-return path sets last.state_root, calls self.maybe_prune_trie() and emit_apply_profile(...) but skips emitting the SENTRIX_STATE_FINGERPRINT; before the return Ok(()) call add a call to emit_state_fingerprint(self, self.height()) (or the proper method signature) so the fingerprint is emitted on the `#1e` acceptance path; target the block containing last.state_root = Some(received_root), self.maybe_prune_trie(), emit_apply_profile(...) and the subsequent return to insert the emit_state_fingerprint call there.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Cargo.toml`:
- Line 10: The PR changes default consensus behavior but only updated the
workspace version from "2.2.40" to "2.2.41"; check the project's versioning
policy to confirm whether consensus-behavior changes require a minor bump, and
if so update the version field (currently set via the version = "2.2.41" entry)
to the next minor (e.g., 2.3.0), update the changelog/release notes to call out
the consensus behavior change, and adjust the PR title/description and commit
message to reflect a minor-semver bump so operators are properly signaled; if
policy allows a patch, add an explicit note in the changelog explaining why a
patch bump was used.
- Line 10: The environment variable SENTRIX_OBSERVER_TOLERANT_STATE_ROOT was
removed instead of being preserved as a no-op; restore a benign read of
SENTRIX_OBSERVER_TOLERANT_STATE_ROOT (e.g., getenv and ignore or set to an
unused local) in the initialization path that previously referenced it (look for
block executor or observer initialization code such as functions in
block_executor.rs or any observer bootstrap function) so the env var remains
accepted but has no behavioral effect, or alternatively add a clear deprecation
comment and a documented removal note in the code and configuration docs and
ensure the fullnode config handling reflects the removal; update the code to
read the var and ignore its value or add the explicit deprecation handling where
SENTRIX_OBSERVER_TOLERANT_STATE_ROOT was formerly used.
In `@crates/sentrix-core/src/block_executor.rs`:
- Around line 1730-1760: The current `#1e` accept path checks only
last.justification.is_some() and can accept unvalidated justification blobs when
add_block_impl allowed bypass (e.g. via SENTRIX_REPLAY_BYPASS_AUTHZ or
voyager_mode_for(expected_index)), so modify the code to require the actual
Pass-1 "justification verified" result in Pass-2 instead of presence alone:
propagate the Pass-1 verification outcome (e.g. a boolean/enum like
justification_verified or the original Pass-1 predicate) from add_block_impl
into the BlockExecutor state used by add_block_impl/pass-2, or recompute the
exact Pass-1 predicate here (calling the same verification routine used in
Pass-1) before accepting under the BlockSource::Peer branch; ensure references
to last.justification and the new verification flag are used so blocks with only
a blob but not a verified 2/3 justification are rejected.
---
Outside diff comments:
In `@crates/sentrix-core/src/block_executor.rs`:
- Around line 2808-2829: The test currently clears block.justification so
apply_reward_bookkeeping_for_latest_block() returns early and never mutates
pending_rewards; restore a justification so Pass-2 reward bookkeeping runs
before you tamper the state_root. Concretely: revert/remove the line setting
block.justification = None (or set it to the original Some(justification) used
earlier), keep the tampered block.state_root and block.hash changes, then call
bc.add_block_from_peer(...) to trigger the `#1e` rejection and verify
pending_rewards rolls back; references: block.justification, block.state_root,
block.calculate_hash(), add_block_from_peer,
apply_reward_bookkeeping_for_latest_block, and
stake_registry.validators.get("v1").pending_rewards.
- Around line 1762-1779: This early-return path sets last.state_root, calls
self.maybe_prune_trie() and emit_apply_profile(...) but skips emitting the
SENTRIX_STATE_FINGERPRINT; before the return Ok(()) call add a call to
emit_state_fingerprint(self, self.height()) (or the proper method signature) so
the fingerprint is emitted on the `#1e` acceptance path; target the block
containing last.state_root = Some(received_root), self.maybe_prune_trie(),
emit_apply_profile(...) and the subsequent return to insert the
emit_state_fingerprint call there.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 98c07e0a-edd7-4445-9144-49494e48202c
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock,!**/*.lock
📒 Files selected for processing (2)
Cargo.tomlcrates/sentrix-core/src/block_executor.rs
| # they can't drift across crates. | ||
| [workspace.package] | ||
| version = "2.2.40" | ||
| version = "2.2.41" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider version-bump semantics for consensus behavior changes.
The workspace version was incremented by a patch level (2.2.40 → 2.2.41). While the PR states "no fork gate is required" because only the accept/reject decision changes for already-canonical blocks, this PR does alter the default consensus behavior by removing environment-variable gating and making justified peer block acceptance uniform across all nodes.
For blockchain systems, behavioral changes in consensus logic—even when deterministic and backward-compatible at the state level—may warrant a minor version bump to signal the operational change to node operators, especially when it affects validator/fullnode behavior divergence.
Verify that the project's versioning policy allows patch bumps for consensus behavior changes that don't require fork gates.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Cargo.toml` at line 10, The PR changes default consensus behavior but only
updated the workspace version from "2.2.40" to "2.2.41"; check the project's
versioning policy to confirm whether consensus-behavior changes require a minor
bump, and if so update the version field (currently set via the version =
"2.2.41" entry) to the next minor (e.g., 2.3.0), update the changelog/release
notes to call out the consensus behavior change, and adjust the PR
title/description and commit message to reflect a minor-semver bump so operators
are properly signaled; if policy allows a patch, add an explicit note in the
changelog explaining why a patch bump was used.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check how SENTRIX_OBSERVER_TOLERANT_STATE_ROOT is handled in block_executor
# Search for references to the env var in block_executor
echo "=== Checking SENTRIX_OBSERVER_TOLERANT_STATE_ROOT in block_executor ==="
rg -nC5 'SENTRIX_OBSERVER_TOLERANT_STATE_ROOT' crates/sentrix-core/src/block_executor.rs
# Check if any env var reads still exist
echo -e "\n=== Checking for env::var reads in block_executor ==="
rg -nC3 'env::var.*SENTRIX_OBSERVER' crates/sentrix-core/src/
# Verify the env var is mentioned in comments or kept as no-op
echo -e "\n=== Checking if variable is documented as ignored ==="
rg -nC2 'OBSERVER.*TOLERANT.*ignore|ignore.*OBSERVER.*TOLERANT' crates/sentrix-core/src/Repository: sentrix-labs/sentrix
Length of output: 246
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Find all references to the env var anywhere in the repo
echo "=== All references to SENTRIX_OBSERVER_TOLERANT_STATE_ROOT ==="
rg -n --hidden --glob '!.git/*' 'SENTRIX_OBSERVER_TOLERANT_STATE_ROOT' . || true
# 2) Find all env reads that mention OBSERVER or TOLERANT near them
echo -e "\n=== env::var reads mentioning OBSERVER/TOLERANT ==="
rg -n --hidden --glob '!.git/*' 'env::var\([^)]*(OBSERVER|TOLERANT)[^)]*\)|std::env::var\([^)]*(OBSERVER|TOLERANT)[^)]*\)' crates/ . || true
# 3) Look for docs/config mentioning the env var
echo -e "\n=== Mentions in README/docs/config ==="
rg -n --hidden --glob '!.git/*' 'SENTRIX_OBSERVER_TOLERANT_STATE_ROOT' -S README* docs* .github* config* crates/ || trueRepository: sentrix-labs/sentrix
Length of output: 332
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 4) Specifically inspect fullnode config parsing code for references/removal patterns
# (adjusted by searching for "fullnode" + env::var usage)
echo "=== Search for fullnode config env handling patterns ==="
rg -n --hidden --glob '!.git/*' 'fullnode' crates/ | head -n 50 || true
echo -e "\n=== Search for env var handling in fullnode-related code ==="
rg -n --hidden --glob '!.git/*' '(env::var|std::env::var)' crates/ | rg -n '(fullnode|config|observer|tolerant)' || trueRepository: sentrix-labs/sentrix
Length of output: 1715
Preserve SENTRIX_OBSERVER_TOLERANT_STATE_ROOT as an ignored/no-op env var (it’s currently gone)
SENTRIX_OBSERVER_TOLERANT_STATE_ROOT has no references anywhere in the repo (including crates/sentrix-core/src/block_executor.rs), so it appears to be deleted rather than kept as an ignored/no-op env var. This conflicts with the consensus-critical guideline to not delete existing env vars/features—either retain the env var read (even if it no longer affects behavior) or document a clear deprecation/removal rationale and ensure the fullnode config follow-up is consistent.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Cargo.toml` at line 10, The environment variable
SENTRIX_OBSERVER_TOLERANT_STATE_ROOT was removed instead of being preserved as a
no-op; restore a benign read of SENTRIX_OBSERVER_TOLERANT_STATE_ROOT (e.g.,
getenv and ignore or set to an unused local) in the initialization path that
previously referenced it (look for block executor or observer initialization
code such as functions in block_executor.rs or any observer bootstrap function)
so the env var remains accepted but has no behavioral effect, or alternatively
add a clear deprecation comment and a documented removal note in the code and
configuration docs and ensure the fullnode config handling reflects the removal;
update the code to read the var and ignore its value or add the explicit
deprecation handling where SENTRIX_OBSERVER_TOLERANT_STATE_ROOT was formerly
used.
Source: Coding guidelines
…ept (CodeRabbit) CodeRabbit #801 Major: keying the #1e accept off block.justification.is_some() was imprecise — add_block_impl SKIPS the 2/3 gate when SENTRIX_REPLAY_BYPASS_AUTHZ is set or voyager_mode_for(height) is false, so a peer block could carry an arbitrary (unvalidated) justification blob, reach Pass-2, and be accepted on #1e without ever passing BFT validation. Fix: a per-add transient `current_add_justification_verified` (serde-skipped), reset at the top of add_block_impl and set true ONLY at the end of the gate block — i.e. only when the gate actually RAN and PASSED. Pass-2's #1e accept now reads that flag instead of justification presence, so it tolerates a state_root mismatch strictly on gate-verified blocks; replay / pre-voyager / no-justification blocks fall through to the strict reject. Tests unchanged + still green (justified→accept goes through the gate→flag set; unjustified→reject, gate skipped→flag false). sentrix-core suite + clippy --all-targets -D warnings clean.
…802) test_c03_pass2_failure_rolls_back_state credits v1 to u64::MAX - reward and relies on the coinbase credit overflowing in Pass 2. It read get_block_reward() (which depends on the reward-fork env vars: VOYAGER_REWARD_V2_HEIGHT / TOKENOMICS_V2_HEIGHT / halving) WITHOUT holding env_test_lock, while sibling tests set those vars under that lock. Under the parallel `cargo test` run (e.g. the report-only coverage job) a concurrent mutation changed `reward` mid-test, the overflow didn't trigger, add_block returned Ok, and unwrap_err panicked — the recurring CI flake that needed re-runs on #795/#798/#800/#801. Acquire env_test_lock at the top of the test so it serializes against the env-mutating tests. Test-only change. sentrix-core suite + clippy --all-targets -D warnings clean.
Replaces the per-node
SENTRIX_OBSERVER_TOLERANT_STATE_ROOTenv gate (#795) with a uniform DEFAULT for all nodes: on a #1e state_root mismatch for a Peer block, ACCEPT it (stamp the proposer's root) iff the block carries a justification — i.e. it already passed the 2/3 justification gate inadd_block_implbefore pass-2 (absent/insufficient justification is rejected there). A no-justification Peer block still hits the strict #1e reject.Why
Consensus is on
block_hash(2/3 justification), notstate_root. The chain's state-commitment is imperfect (recurring/oscillating root), so a node that falls behind and catches up via sync recomputes a different root → the strict #1e rejected the canonical block → the validator got stuck syncing → chain stalled until halt-all (val1, 2026-06-06). The env gate made only the fullnode tolerant, so validators kept stalling. Making it the default fixes the stuck-sync at the root and removes the validator-vs-fullnode divergence the env created.Safety
Only the accept/reject decision on already-canonical (justification-gated) blocks changes; the stamped root is identical either way → no computed-state change, no fork gate needed. The accept inherits the chain's existing justification trust level (the
add_block_implgate) — no weaker.STRICT_JUSTIFICATION_HEIGHT) is a separate, currently-broken path:recover_signerdisagrees with the producer's signing payload (validated 2026-06-06 — enabling it on the fullnode rejected legitimate blocks). Fixing that to enable strict verification fleet-wide is a tracked follow-up; this PR does not depend on it and is no weaker than today.Tests
test_justified_peer_block_accepted_on_state_root_mismatch— justified Peer block + tampered root pastSTATE_ROOT_FORK_HEIGHT→ ACCEPTED, proposer root stamped.test_c03_pass2_failure_restores_staking_state(updated) — no-justification counterpart → REJECTED, staking state rolled back.cargo test -p sentrix-core: 264 passed;cargo clippy --workspace --all-targets -D warningsclean.Follow-up
Remove the now-ignored
SENTRIX_OBSERVER_TOLERANT_STATE_ROOT=1from the fullnode env (harmless; code no longer reads it). Fix the brokenrecover_signerpayload soSTRICT_JUSTIFICATION_HEIGHTcan be enabled for real cryptographic verification.Summary by CodeRabbit
Chores
Bug Fixes