Skip to content

fix(consensus): accept justified peer block on #1e by default (no env)#801

Merged
github-actions[bot] merged 2 commits into
mainfrom
fix/justified-1e-accept-default
Jun 6, 2026
Merged

fix(consensus): accept justified peer block on #1e by default (no env)#801
github-actions[bot] merged 2 commits into
mainfrom
fix/justified-1e-accept-default

Conversation

@satyakwok
Copy link
Copy Markdown
Member

@satyakwok satyakwok commented Jun 6, 2026

Replaces the per-node SENTRIX_OBSERVER_TOLERANT_STATE_ROOT env gate (#795) with a uniform DEFAULT for all nodes: on a #1e state_root mismatch for a Peer block, ACCEPT it (stamp the proposer's root) iff the block carries a justification — i.e. it already passed the 2/3 justification gate in add_block_impl before pass-2 (absent/insufficient justification is rejected there). A no-justification Peer block still hits the strict #1e reject.

Why

Consensus is on block_hash (2/3 justification), not state_root. The chain's state-commitment is imperfect (recurring/oscillating root), so a node that falls behind and catches up via sync recomputes a different root → the strict #1e rejected the canonical block → the validator got stuck syncing → chain stalled until halt-all (val1, 2026-06-06). The env gate made only the fullnode tolerant, so validators kept stalling. Making it the default fixes the stuck-sync at the root and removes the validator-vs-fullnode divergence the env created.

Safety

Only the accept/reject decision on already-canonical (justification-gated) blocks changes; the stamped root is identical either way → no computed-state change, no fork gate needed. The accept inherits the chain's existing justification trust level (the add_block_impl gate) — no weaker.

⚠️ Cryptographic per-signature justification verification (STRICT_JUSTIFICATION_HEIGHT) is a separate, currently-broken path: recover_signer disagrees with the producer's signing payload (validated 2026-06-06 — enabling it on the fullnode rejected legitimate blocks). Fixing that to enable strict verification fleet-wide is a tracked follow-up; this PR does not depend on it and is no weaker than today.

Tests

  • test_justified_peer_block_accepted_on_state_root_mismatch — justified Peer block + tampered root past STATE_ROOT_FORK_HEIGHT → ACCEPTED, proposer root stamped.
  • test_c03_pass2_failure_restores_staking_state (updated) — no-justification counterpart → REJECTED, staking state rolled back.
  • cargo test -p sentrix-core: 264 passed; cargo clippy --workspace --all-targets -D warnings clean.

Follow-up

Remove the now-ignored SENTRIX_OBSERVER_TOLERANT_STATE_ROOT=1 from the fullnode env (harmless; code no longer reads it). Fix the broken recover_signer payload so STRICT_JUSTIFICATION_HEIGHT can be enabled for real cryptographic verification.

Summary by CodeRabbit

  • Chores

    • Version bumped to 2.2.41
  • Bug Fixes

    • Block validation now accepts justified peer blocks when state-root mismatches occur

Replaces the per-node SENTRIX_OBSERVER_TOLERANT_STATE_ROOT env gate (#795)
with a uniform, default behavior for ALL nodes: on a #1e state_root mismatch
for a Peer block, ACCEPT it (stamp the proposer's root) iff the block carries
a justification — which means it already passed the 2/3 justification gate in
add_block_impl before pass-2 (an absent/insufficient justification is rejected
there). A no-justification Peer block still hits the strict #1e reject.

Why: consensus is on block_hash (2/3 justification), not state_root. The
chain's state-commitment is imperfect (recurring state_root), so a node that
falls behind and catches up via sync recomputes a different root and the
strict #1e REJECTED the canonical block → the validator got stuck syncing →
chain stalled until halt-all (val1, 2026-06-06). The env gate made only the
fullnode tolerant, so validators kept stalling; making it the default fixes
the stuck-sync at the root and removes the val-vs-fullnode divergence.

Safety: this only changes the accept/reject decision on already-canonical
(justification-gated) blocks; the stamped root is identical either way, so no
computed-state change and no fork gate needed. The accept inherits the chain's
existing justification trust level (the add_block_impl gate) — no weaker.
NOTE: cryptographic per-signature justification verification
(STRICT_JUSTIFICATION_HEIGHT) is a separate, currently-broken path
(recover_signer disagrees with the producer's signing payload — validated
2026-06-06: enabling it rejects legitimate blocks); fixing that to enable
strict verification fleet-wide is a tracked follow-up.

Tests: justified Peer block + tampered root past fork height → ACCEPTED, root
stamped; the no-justification counterpart → REJECTED (the updated C-03
rollback test). Both use the pad-past-STATE_ROOT_FORK_HEIGHT harness.
sentrix-core 264 passed; clippy --all-targets -D warnings clean.
@github-actions github-actions Bot enabled auto-merge (squash) June 6, 2026 11:36
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

Review Change Stack

Warning

Review limit reached

@satyakwok, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 44 minutes and 54 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 2b54c0f3-a1c7-4626-b6d1-b9f32318799e

📥 Commits

Reviewing files that changed from the base of the PR and between a420286 and c204259.

📒 Files selected for processing (2)
  • crates/sentrix-core/src/block_executor.rs
  • crates/sentrix-core/src/blockchain.rs
📝 Walkthrough

Walkthrough

The PR updates the state-root mismatch handling in apply_block_pass2 and bumps the workspace version. Peer blocks with state-root mismatches are now accepted when they carry a justification (received root is stamped onto the chain), replacing prior environment-variable gating. Unjustified peer mismatches continue to be strictly rejected. The existing staking-state rollback test is refactored to use unjustified blocks, and a new test validates the justified acceptance path.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

  • sentrix-labs/sentrix#751: The core state-root stamping logic in apply_block_pass2 directly modifies the same block-apply and state-root handling path described in the issue.

Possibly related PRs

  • sentrix-labs/sentrix#792: Both PRs modify apply_block_pass2 around #1e state-root mismatch scenarios—this PR changes peer justification acceptance and test rejection criteria, while the related PR adds staking state restoration on Pass-2 rollback after #1e rejection.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description provides comprehensive context but does not follow the repository's required template structure with sections like Summary, Scope, Checks, and Deploy impact. Restructure the description to follow the repository template: add Summary (1-3 sentences), Scope (checkboxes), Checks (build/test verification), and Deploy impact sections.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: accepting justified peer blocks on state-root mismatch by default instead of via environment variable.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/justified-1e-accept-default

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
crates/sentrix-core/src/block_executor.rs (2)

2808-2829: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This regression test no longer exercises the staking-state leak it describes.

Setting block.justification = None prevents apply_reward_bookkeeping_for_latest_block() from mutating pending_rewards at all — it returns early when the latest block has no justification (Lines 1888-1890). The #1e rejection still happens, but pending_before == pending_after now passes even if the C-03 rollback of stake_registry is broken. Please keep this test driving a post-bundle Pass-2 failure instead of a no-justification fast path so it continues to pin the real RCA invariant.

As per coding guidelines, test changes under crates/sentrix-core/src/block_executor** must pin a specific invariant from a real bug RCA.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sentrix-core/src/block_executor.rs` around lines 2808 - 2829, The test
currently clears block.justification so
apply_reward_bookkeeping_for_latest_block() returns early and never mutates
pending_rewards; restore a justification so Pass-2 reward bookkeeping runs
before you tamper the state_root. Concretely: revert/remove the line setting
block.justification = None (or set it to the original Some(justification) used
earlier), keep the tampered block.state_root and block.hash changes, then call
bc.add_block_from_peer(...) to trigger the `#1e` rejection and verify
pending_rewards rolls back; references: block.justification, block.state_root,
block.calculate_hash(), add_block_from_peer,
apply_reward_bookkeeping_for_latest_block, and
stake_registry.validators.get("v1").pending_rewards.

Source: Coding guidelines


1762-1779: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve SENTRIX_STATE_FINGERPRINT on this new early-return path.

This return still bypasses the fingerprint emit at Lines 1834-1841. With #1e acceptance now being the default, the exact blocks we need for cross-node drift forensics stop producing the load-bearing fingerprint on validators too. Mirror emit_state_fingerprint(self, self.height()) before returning so this path keeps the existing debugging contract.

As per coding guidelines, SENTRIX_STATE_FINGERPRINT is load-bearing for debugging + chain replay, and removing it silently is a regression.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/sentrix-core/src/block_executor.rs` around lines 1762 - 1779, This
early-return path sets last.state_root, calls self.maybe_prune_trie() and
emit_apply_profile(...) but skips emitting the SENTRIX_STATE_FINGERPRINT; before
the return Ok(()) call add a call to emit_state_fingerprint(self, self.height())
(or the proper method signature) so the fingerprint is emitted on the `#1e`
acceptance path; target the block containing last.state_root =
Some(received_root), self.maybe_prune_trie(), emit_apply_profile(...) and the
subsequent return to insert the emit_state_fingerprint call there.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Cargo.toml`:
- Line 10: The PR changes default consensus behavior but only updated the
workspace version from "2.2.40" to "2.2.41"; check the project's versioning
policy to confirm whether consensus-behavior changes require a minor bump, and
if so update the version field (currently set via the version = "2.2.41" entry)
to the next minor (e.g., 2.3.0), update the changelog/release notes to call out
the consensus behavior change, and adjust the PR title/description and commit
message to reflect a minor-semver bump so operators are properly signaled; if
policy allows a patch, add an explicit note in the changelog explaining why a
patch bump was used.
- Line 10: The environment variable SENTRIX_OBSERVER_TOLERANT_STATE_ROOT was
removed instead of being preserved as a no-op; restore a benign read of
SENTRIX_OBSERVER_TOLERANT_STATE_ROOT (e.g., getenv and ignore or set to an
unused local) in the initialization path that previously referenced it (look for
block executor or observer initialization code such as functions in
block_executor.rs or any observer bootstrap function) so the env var remains
accepted but has no behavioral effect, or alternatively add a clear deprecation
comment and a documented removal note in the code and configuration docs and
ensure the fullnode config handling reflects the removal; update the code to
read the var and ignore its value or add the explicit deprecation handling where
SENTRIX_OBSERVER_TOLERANT_STATE_ROOT was formerly used.

In `@crates/sentrix-core/src/block_executor.rs`:
- Around line 1730-1760: The current `#1e` accept path checks only
last.justification.is_some() and can accept unvalidated justification blobs when
add_block_impl allowed bypass (e.g. via SENTRIX_REPLAY_BYPASS_AUTHZ or
voyager_mode_for(expected_index)), so modify the code to require the actual
Pass-1 "justification verified" result in Pass-2 instead of presence alone:
propagate the Pass-1 verification outcome (e.g. a boolean/enum like
justification_verified or the original Pass-1 predicate) from add_block_impl
into the BlockExecutor state used by add_block_impl/pass-2, or recompute the
exact Pass-1 predicate here (calling the same verification routine used in
Pass-1) before accepting under the BlockSource::Peer branch; ensure references
to last.justification and the new verification flag are used so blocks with only
a blob but not a verified 2/3 justification are rejected.

---

Outside diff comments:
In `@crates/sentrix-core/src/block_executor.rs`:
- Around line 2808-2829: The test currently clears block.justification so
apply_reward_bookkeeping_for_latest_block() returns early and never mutates
pending_rewards; restore a justification so Pass-2 reward bookkeeping runs
before you tamper the state_root. Concretely: revert/remove the line setting
block.justification = None (or set it to the original Some(justification) used
earlier), keep the tampered block.state_root and block.hash changes, then call
bc.add_block_from_peer(...) to trigger the `#1e` rejection and verify
pending_rewards rolls back; references: block.justification, block.state_root,
block.calculate_hash(), add_block_from_peer,
apply_reward_bookkeeping_for_latest_block, and
stake_registry.validators.get("v1").pending_rewards.
- Around line 1762-1779: This early-return path sets last.state_root, calls
self.maybe_prune_trie() and emit_apply_profile(...) but skips emitting the
SENTRIX_STATE_FINGERPRINT; before the return Ok(()) call add a call to
emit_state_fingerprint(self, self.height()) (or the proper method signature) so
the fingerprint is emitted on the `#1e` acceptance path; target the block
containing last.state_root = Some(received_root), self.maybe_prune_trie(),
emit_apply_profile(...) and the subsequent return to insert the
emit_state_fingerprint call there.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 98c07e0a-edd7-4445-9144-49494e48202c

📥 Commits

Reviewing files that changed from the base of the PR and between 71a080b and a420286.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock, !**/*.lock
📒 Files selected for processing (2)
  • Cargo.toml
  • crates/sentrix-core/src/block_executor.rs

Comment thread Cargo.toml
# they can't drift across crates.
[workspace.package]
version = "2.2.40"
version = "2.2.41"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider version-bump semantics for consensus behavior changes.

The workspace version was incremented by a patch level (2.2.40 → 2.2.41). While the PR states "no fork gate is required" because only the accept/reject decision changes for already-canonical blocks, this PR does alter the default consensus behavior by removing environment-variable gating and making justified peer block acceptance uniform across all nodes.

For blockchain systems, behavioral changes in consensus logic—even when deterministic and backward-compatible at the state level—may warrant a minor version bump to signal the operational change to node operators, especially when it affects validator/fullnode behavior divergence.

Verify that the project's versioning policy allows patch bumps for consensus behavior changes that don't require fork gates.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Cargo.toml` at line 10, The PR changes default consensus behavior but only
updated the workspace version from "2.2.40" to "2.2.41"; check the project's
versioning policy to confirm whether consensus-behavior changes require a minor
bump, and if so update the version field (currently set via the version =
"2.2.41" entry) to the next minor (e.g., 2.3.0), update the changelog/release
notes to call out the consensus behavior change, and adjust the PR
title/description and commit message to reflect a minor-semver bump so operators
are properly signaled; if policy allows a patch, add an explicit note in the
changelog explaining why a patch bump was used.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how SENTRIX_OBSERVER_TOLERANT_STATE_ROOT is handled in block_executor

# Search for references to the env var in block_executor
echo "=== Checking SENTRIX_OBSERVER_TOLERANT_STATE_ROOT in block_executor ==="
rg -nC5 'SENTRIX_OBSERVER_TOLERANT_STATE_ROOT' crates/sentrix-core/src/block_executor.rs

# Check if any env var reads still exist
echo -e "\n=== Checking for env::var reads in block_executor ==="
rg -nC3 'env::var.*SENTRIX_OBSERVER' crates/sentrix-core/src/

# Verify the env var is mentioned in comments or kept as no-op
echo -e "\n=== Checking if variable is documented as ignored ==="
rg -nC2 'OBSERVER.*TOLERANT.*ignore|ignore.*OBSERVER.*TOLERANT' crates/sentrix-core/src/

Repository: sentrix-labs/sentrix

Length of output: 246


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find all references to the env var anywhere in the repo
echo "=== All references to SENTRIX_OBSERVER_TOLERANT_STATE_ROOT ==="
rg -n --hidden --glob '!.git/*' 'SENTRIX_OBSERVER_TOLERANT_STATE_ROOT' . || true

# 2) Find all env reads that mention OBSERVER or TOLERANT near them
echo -e "\n=== env::var reads mentioning OBSERVER/TOLERANT ==="
rg -n --hidden --glob '!.git/*' 'env::var\([^)]*(OBSERVER|TOLERANT)[^)]*\)|std::env::var\([^)]*(OBSERVER|TOLERANT)[^)]*\)' crates/ . || true

# 3) Look for docs/config mentioning the env var
echo -e "\n=== Mentions in README/docs/config ==="
rg -n --hidden --glob '!.git/*' 'SENTRIX_OBSERVER_TOLERANT_STATE_ROOT' -S README* docs* .github* config* crates/ || true

Repository: sentrix-labs/sentrix

Length of output: 332


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 4) Specifically inspect fullnode config parsing code for references/removal patterns
# (adjusted by searching for "fullnode" + env::var usage)
echo "=== Search for fullnode config env handling patterns ==="
rg -n --hidden --glob '!.git/*' 'fullnode' crates/ | head -n 50 || true

echo -e "\n=== Search for env var handling in fullnode-related code ==="
rg -n --hidden --glob '!.git/*' '(env::var|std::env::var)' crates/ | rg -n '(fullnode|config|observer|tolerant)' || true

Repository: sentrix-labs/sentrix

Length of output: 1715


Preserve SENTRIX_OBSERVER_TOLERANT_STATE_ROOT as an ignored/no-op env var (it’s currently gone)

SENTRIX_OBSERVER_TOLERANT_STATE_ROOT has no references anywhere in the repo (including crates/sentrix-core/src/block_executor.rs), so it appears to be deleted rather than kept as an ignored/no-op env var. This conflicts with the consensus-critical guideline to not delete existing env vars/features—either retain the env var read (even if it no longer affects behavior) or document a clear deprecation/removal rationale and ensure the fullnode config follow-up is consistent.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Cargo.toml` at line 10, The environment variable
SENTRIX_OBSERVER_TOLERANT_STATE_ROOT was removed instead of being preserved as a
no-op; restore a benign read of SENTRIX_OBSERVER_TOLERANT_STATE_ROOT (e.g.,
getenv and ignore or set to an unused local) in the initialization path that
previously referenced it (look for block executor or observer initialization
code such as functions in block_executor.rs or any observer bootstrap function)
so the env var remains accepted but has no behavioral effect, or alternatively
add a clear deprecation comment and a documented removal note in the code and
configuration docs and ensure the fullnode config handling reflects the removal;
update the code to read the var and ignore its value or add the explicit
deprecation handling where SENTRIX_OBSERVER_TOLERANT_STATE_ROOT was formerly
used.

Source: Coding guidelines

Comment thread crates/sentrix-core/src/block_executor.rs Outdated
@satyakwok satyakwok self-assigned this Jun 6, 2026
…ept (CodeRabbit)

CodeRabbit #801 Major: keying the #1e accept off block.justification.is_some()
was imprecise — add_block_impl SKIPS the 2/3 gate when SENTRIX_REPLAY_BYPASS_AUTHZ
is set or voyager_mode_for(height) is false, so a peer block could carry an
arbitrary (unvalidated) justification blob, reach Pass-2, and be accepted on
#1e without ever passing BFT validation.

Fix: a per-add transient `current_add_justification_verified` (serde-skipped),
reset at the top of add_block_impl and set true ONLY at the end of the gate
block — i.e. only when the gate actually RAN and PASSED. Pass-2's #1e accept
now reads that flag instead of justification presence, so it tolerates a
state_root mismatch strictly on gate-verified blocks; replay / pre-voyager /
no-justification blocks fall through to the strict reject.

Tests unchanged + still green (justified→accept goes through the gate→flag set;
unjustified→reject, gate skipped→flag false). sentrix-core suite + clippy
--all-targets -D warnings clean.
@github-actions github-actions Bot merged commit a859649 into main Jun 6, 2026
18 of 19 checks passed
@satyakwok satyakwok deleted the fix/justified-1e-accept-default branch June 6, 2026 12:15
github-actions Bot pushed a commit that referenced this pull request Jun 6, 2026
…802)

test_c03_pass2_failure_rolls_back_state credits v1 to u64::MAX - reward and
relies on the coinbase credit overflowing in Pass 2. It read get_block_reward()
(which depends on the reward-fork env vars: VOYAGER_REWARD_V2_HEIGHT /
TOKENOMICS_V2_HEIGHT / halving) WITHOUT holding env_test_lock, while sibling
tests set those vars under that lock. Under the parallel `cargo test` run (e.g.
the report-only coverage job) a concurrent mutation changed `reward` mid-test,
the overflow didn't trigger, add_block returned Ok, and unwrap_err panicked —
the recurring CI flake that needed re-runs on #795/#798/#800/#801.

Acquire env_test_lock at the top of the test so it serializes against the
env-mutating tests. Test-only change. sentrix-core suite + clippy --all-targets
-D warnings clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant