Skip to content

feat: V10 RandomSampling pipeline + StakingStorage→ConvictionStakingStorage consolidation#357

Open
branarakic wants to merge 22 commits intomainfrom
feat/v10-random-sampling-and-staking-consolidation
Open

feat: V10 RandomSampling pipeline + StakingStorage→ConvictionStakingStorage consolidation#357
branarakic wants to merge 22 commits intomainfrom
feat/v10-random-sampling-and-staking-consolidation

Conversation

@branarakic
Copy link
Copy Markdown
Contributor

Summary

Two interlocking pre-mainnet changes that land together because the second is a hard prerequisite for the first to ever produce a non-zero score:

  1. End-to-end V10 RandomSampling — new @origintrail-official/dkg-random-sampling workspace package (prover orchestrator + KC extractor + worker-thread proof builder + WAL), bound into the agent lifecycle, surfaced through the daemon API + CLI. Plus the underlying chain reads (RS view methods, V10 stake reads, KC view methods on both EVM and mock adapters) and the merkleLeafCount field added to publish/update ACK digests so the on-chain ACK gate pins the leaf count alongside the merkle root.

  2. StakingStorageConvictionStakingStorage consolidation — V10 CSS becomes the single source of truth: TRAC vault (via Guardian), operator-fee accountant, canonical getNodeStakeV10 for RandomSampling.calculateNodeScore, Ask / ShardingTable / StakingKPI. All TRAC sinks (KnowledgeCollection publish fees, Paymaster.coverCost, PublishingConvictionAccount, DKGStakingConvictionNFT, DKGPublishingConvictionNFT) reroute to CSS. Closes the trap that left nodeStakeV10 = 0 whenever bootstrap used the V8 Staking.stake() path.

The original symptom was `RandomSamplingStorage.getNodeEpochProofPeriodScore` returning 0 across the entire devnet — RandomSampling.calculateNodeScore reads getNodeStakeV10 exclusively, but the V8 staking path didn't update it, and there was no V10 stake bootstrap. Rather than shimming the dual-store coupling, we collapsed the two stores. Migration is mandatory: every V8 delegator becomes a V10 NFT position via StakingV10._convertToNFT. Post-cutover the V8 store is dead-but-deployed weight; deletion of Staking.sol + DelegatorsInfo.sol is tracked as a follow-up.

Commit map (8 commits, ~7700 LoC)

# Commit Scope
1 feat(core,chain): random-sampling read surface… core proof-material module, merkleLeafCount in ACK digests, chain RS reads + V10 stake reads + KC views, ABI refresh, mock-adapter parity
2 feat(random-sampling,agent): RS prover… new random-sampling workspace package (prover, kc-extractor, proof-builder worker, WAL), agent bind
3 feat(contracts): V10 staking consolidation… ConvictionStakingStorage v4.0.0 (Guardian-based TRAC vault + operator-fee accountant), StakingV10 v3.0.0 rewire, all vault-target consumers reroute to CSS, merkleLeafCount end-to-end, deploy script dependencies
4 feat(publisher): bind merkleLeafCount… publisher carries merkleLeafCount end-to-end; ACK collector includes it in identity-binding fingerprint
5 test(evm-module): refresh test suite… KC helper + KAv10 + Paymaster + NFT test refreshes for new vault target; ConvictionStakingStorage targeted unit suite; V8-API-coupled tests describe.skip'd with notes pointing at follow-up-2
6 feat(cli): wire RS into daemon API + CLI GET /api/random-sampling/status, dkg random-sampling status CLI, lifecycle hooks
7 chore(scripts): devnet V10 stake bootstrap + RS smoke devnet.sh switched to DKGStakingConvictionNFT.createConviction(uint72,uint96,uint40); new scripts/devnet-test-random-sampling.sh E2E smoke; chain-analysis + epoch-snapshot updated for V10 reads
8 chore(deps): pnpm-lock lockfile delta for the new workspace

Test plan

Local devnet smoke (already run on this branch's HEAD — PASSING)

$ ./scripts/devnet.sh stop && ./scripts/devnet.sh clean
$ pnpm install --frozen-lockfile
$ pnpm run build           # 20/20 successful
$ ./scripts/devnet.sh start 6
[devnet] === Devnet Ready ===   # 6 cores, 2 CGs registered, V10 stake bootstrap clean
$ ./scripts/devnet-test-random-sampling.sh
[rs-test] Submitted: node=2 idId=2 tx=0xde89fcb25d621503d572f135a05dfc875728a2087d61847eab071afcdc575a78
[rs-test] On-chain solved=true (epoch=1, periodStartBlock=300)
[rs-test] On-chain score=200000000000000   # non-zero — V10 stake routing confirmed
[rs-test]   WAL trail: challenge,extracted,built,submitted
[rs-test] === Random Sampling devnet smoke: PASS ===

Suggested reviewer pass

  • Read commits in order (commit 3 is the load-bearing contracts diff; commits 5 + 7 confirm the consumer + bootstrap migrations)
  • pnpm -r --filter @origintrail-official/dkg-random-sampling test — prover + WAL + extractor unit tests
  • pnpm -r --filter @origintrail-official/dkg-chain test — RS reads + mock parity
  • pnpm -r --filter @origintrail-official/dkg-evm-module test — contracts (note: 4 V8-API-coupled suites are intentionally describe.skip'd with inline references to follow-up-2)
  • pnpm -r --filter @origintrail-official/dkg-publisher testmerkleLeafCount propagation
  • Repeat the local devnet smoke above

CI

Standard turbo build + test across the workspace.

Migration / cutover notes

  • Mandatory V8→V10 migration: every existing V8 delegator must be converted to a V10 NFT position via StakingV10._convertToNFT(stakingStorage, ...) before the V8 store is removed. The stakingStorage field on StakingV10 is retained ONLY for this drain path and explicitly commented as dead post-cutover.
  • ABI consumers (indexers, dashboards, SDK): getNodeStakeV10 is the canonical V10 stake read (CSS, not StakingStorage). Ask v2.0.0 and ShardingTable.getMultipleNodes switch to it.
  • Publish/update ACKs now require merkleLeafCount — publishers/relayers that compute the digest themselves must update their digest construction to the 9/11-field signature.

Follow-ups (separate PRs)

  • followup-1: delete V8 Staking.sol + DelegatorsInfo.sol + their tests + deploy scripts (depends on devnet-soak confirming V10-only operation).
  • followup-2: rewrite StakingKPI's per-delegator surface from V8-key to V10-tokenId-key; un-skip the 4 evm-module test suites that depend on it.

Made with Cursor

Branimir Rakic added 18 commits April 30, 2026 21:18
…rkleLeafCount in ACK digests

Foundations for the V10 Random Sampling pipeline + the V10 staking
consolidation read path. Pure additions / view-method plumbing — no
write paths, no behavioral changes outside what the new digest field
forces (publish/update intents now bind merkleLeafCount end-to-end).

packages/core
- New `proof-material` module: V10 flat-KC Merkle proof + leaf material
  shaping shared by the prover and verifier.
- `v10-merkle` exposes the leaf material the prover needs.
- `ack.computePublishACKDigest` / `computeUpdateACKDigest` add
  `merkleLeafCount` (now 9 / 11 fields) so the on-chain ACK gate can
  pin the leaf count in addition to the merkle root.
- `proto/publish-intent` carries the new field; tests updated.

packages/chain
- `chain-adapter` adds RandomSampling read methods + V10 stake views
  + KC view methods used by the prover (KC root, leaf count, sigs).
- `evm-adapter` implements them, prefers `getNodeStakeV10` (CSS) for
  ACK identity verification, and adds the post-consolidation
  contracts (`ConvictionStakingStorage`, `StakingV10`, `StakingKPI`,
  `DKGStakingConvictionNFT`, `DKGPublishingConvictionNFT`) to the
  error-decode allowlist so reverts surface clean reasons.
- `mock-adapter` mirrors the new RS + KC views.
- ABIs refreshed for `RandomSampling`, `KnowledgeCollection(Storage)`,
  `KnowledgeAssetsV10`, `Ask`, `ShardingTable`, `StakingKPI`,
  `DKGStakingConvictionNFT`, plus net-new `ConvictionStakingStorage`
  and `StakingV10`.
- New test files cover RS reads end-to-end on both EVM and mock
  adapters; `mock-adapter-parity` exempts the new private requireKC /
  requireContextGraph helpers.

This commit is read-only at the wire level. The actual prover, the
contract write paths, and the publisher digest emission land in the
follow-up commits on this branch.

Made-with: Cursor
… agent bind

New `@origintrail-official/dkg-random-sampling` workspace package
implements the off-chain prover side of RFC-26 / V10 RandomSampling.
Cores-only, with optional core mutual aid as a future extension hook.

packages/random-sampling
- `prover` — orchestrator. Each tick: read on-chain
  `getActiveProofPeriodStatus`, ensure a node challenge exists
  (createChallenge auto-rotates per period), then for the open
  challenge: extract KC root entities + V10 leaves from local
  oxigraph (`kc-extractor`), build the Merkle proof off-thread
  (`proof-builder` → worker), submit on-chain.
- `kc-extractor` resolves cgId → cgName via the local `ontology`
  graph, opens `did:dkg:context-graph:<NAME>/context/<cgId>/_meta`,
  pulls the KC's root entities + private roots + V10 leaves.
- `proof-builder` runs the V10 Merkle build inside a `worker_threads`
  worker so prover ticks stay non-blocking even on large KCs.
- `wal` write-ahead log persists every step
  (`challenge → extracted → built → submitted`) for crash recovery
  and ops-side observability.
- Vitest suite covers prover state machine, WAL recovery, KC
  extractor URI mapping, and worker round-trip.

packages/agent
- `random-sampling-bind` wires the prover into the agent lifecycle:
  resolves chain adapter + oxigraph store + WAL path from agent
  config, schedules `prover.tick()` at the configured cadence
  (default 5s on devnet), and surfaces status to the daemon API.
- `dkg-agent` opt-in mounts the bind only on core nodes that have
  an on-chain identity. Edge nodes skip RS entirely; an
  edge → core upgrade picks it up on next agent restart.
- `index` re-exports the bind for downstream consumers.

This commit is purely additive on the agent surface (one new
behavior, gated). All write paths are RPCs to existing on-chain
contracts; no contract changes here.

Made-with: Cursor
…ator-fee accountant; merkleLeafCount + RS scoring inputs

Pre-mainnet V10 cleanup. The previous architecture had two parallel
staking storage contracts (V8 `StakingStorage`, V10
`ConvictionStakingStorage`) and a hidden coupling: any V8-stake-based
bootstrap left V10 `nodeStakeV10 = 0`, which forced
`RandomSampling.calculateNodeScore` to compute zero across the entire
network. Consolidating eliminates the trap and the ongoing dual-store
maintenance cost.

ConvictionStakingStorage v4.0.0 — single source of truth for V10
- Switches base from `HubDependent` to `Guardian` so CSS is the V10
  TRAC vault (holds `tokenContract`, exposes `transferStake` outflow).
- Absorbs the operator-fee surface from V8 StakingStorage:
  `nodeOperatorFeeBalance` mapping, `operatorFeeWithdrawals` queue,
  full set/increase/decrease/get balance accessors, and the
  create/delete/get withdrawal-request accessors.
- Header doc + version history pin the new role.

StakingV10 v3.0.0 — write-side rewire
- `stake` deposits TRAC into CSS (was StakingStorage).
- `withdraw` outflows via `cs.transferStake`.
- `_claim` operator-fee accrual writes to CSS.
- Net-new V10-native operator-fee withdrawal API:
  `requestOperatorFeeWithdrawal` / `finalizeOperatorFeeWithdrawal` /
  `cancelOperatorFeeWithdrawal`, gated by `onlyAdmin` (IdentityStorage
  admin-key check) and using `parametersStorage.stakeWithdrawalDelay`.
- `stakingStorage` field retained ONLY for `_convertToNFT`'s V8→V10
  drain at cutover; comment makes the dead-code status explicit.

Vault-target consumers all route to CSS
- `KnowledgeAssetsV10` ACK gate now reads `getNodeStakeV10`; publish
  fees flow into CSS. Adds `merkleLeafCount` to publish/update params
  and forwards it to the storage layer.
- `KnowledgeCollection` deposits publish fees into CSS; carries
  `merkleLeafCount` through the createKnowledgeCollection signature
  into `KnowledgeCollectionStorage`.
- `Paymaster.coverCost` resolves CSS as the TRAC sink.
- `PublishingConvictionAccount` and `DKGPublishingConvictionNFT`
  resolve CSS for vault deposits + topUps (NFT field name kept for
  storage layout stability; comments call out the new resolution).
- `DKGStakingConvictionNFT` drops the StakingStorage import + field —
  TRAC pulls happen via the V10 CSS path now.

V10 stake readers point at canonical V10 stake
- `Ask` v2.0.0 reads `getNodeStakeV10` for active-set recalculation.
- `ShardingTable.getMultipleNodes` likewise.
- `StakingKPI` v2.0.0 node-level stats read CSS; per-delegator V8-keyed
  surface is left in place with deprecation comments (followup-2).

KnowledgeCollectionStorage + KnowledgeCollectionLib + RandomSampling
- `merkleLeafCount` parameter pinned on createKnowledgeCollection +
  surfaced on the KC view; RandomSampling's V10 Merkle proof checker
  binds it.
- `RandomSampling` v1.1.0 wires the V10 leaf-count guard into the
  proof verification path; calculateNodeScore continues to read
  CSS-canonical V10 stake.

Guardian.initialize() made `virtual` so CSS can override and combine
its Token wiring with CSS-specific initialization.

Deploys
- `049b` adds `Token` dependency (CSS now needs it via Guardian).
- `019`, `020`, `052` add `ConvictionStakingStorage` dependency
  (Ask + ShardingTable + KAv10 read it at initialize).
- `053` adds `ConvictionStakingStorage` (DKGPublishingConvictionNFT
  resolves it as the vault).
- `055` adds `IdentityStorage` (StakingV10 needs it for the new
  operator-fee admin gate).

Migration is mandatory: every V8 delegator becomes a V10 NFT position
via `StakingV10._convertToNFT`. Post-cutover, the V8 StakingStorage
is dead-but-deployed weight; deletion of V8 `Staking.sol` +
`DelegatorsInfo.sol` is tracked as followup-1.

Made-with: Cursor
Wire the new V10 `merkleLeafCount` field end-to-end on the publisher
side so on-chain ACK signatures pin not just the merkle root but also
the unique-leaf count of the V10 flat-KC tree.

- `publisher.ts` / `dkg-publisher.ts` carry `merkleLeafCount` on every
  publish/update emission and pass it to the chain adapter
  `createKnowledgeAssetsV10` call alongside the merkle root.
- `merkle.ts` exposes the unique-leaf count from the V10 tree builder
  so the publisher and the prover compute identical values.
- `ack-collector` extends the per-receiver digest fingerprint to
  include `merkleLeafCount`; mismatched leaf counts now fail the
  collector's identity-binding check instead of silently ACK'ing.
- `storage-ack-handler` accepts + propagates the field.
- All publisher test suites (`ack-collector`, `ack-digest-v10-vs-legacy`,
  `ack-replay-cost-params`, `storage-ack-handler`,
  `storage-ack-roster-and-verify-mofn`, `v10-ack-edge-cases`,
  `v10-protocol-operations`, `v10-publish-e2e`, `v10-remap-wire`)
  updated to pass `merkleLeafCount` through every publish/update path.

Pairs with the contract-side digest update in the previous commit
(`KnowledgeAssetsV10` + `KnowledgeCollectionStorage`) and the
`computePublishACKDigest` 9-field signature change in core.

Made-with: Cursor
…merkleLeafCount

Aligns the contract test suite with the consolidated V10 vault model
(`ConvictionStakingStorage` is the TRAC sink + operator-fee accountant)
and the new `merkleLeafCount` ACK field.

Helpers
- `kc-helpers.createKnowledgeCollection` accepts + forwards
  `merkleLeafCount` (default 1).
- `v10-kc-helpers` updated to mirror the helper shape.

TRAC-vault rerouting
- `KnowledgeCollection` / `KnowledgeAssetsV10` / `Paymaster` /
  `DKGPublishingConvictionNFT` / `DKGStakingConvictionNFT` /
  `v10-e2e-conviction` / `v10-conviction` test suites assert
  TRAC balance changes against `ConvictionStakingStorage` (was
  StakingStorage). Vault invariant + topUp + coverCost +
  createConviction + createAccount paths all updated.
- `Paymaster.deployPaymasterFixture` registers a mock CSS in the Hub
  so `coverCost` resolves the new dependency.
- `DKGPublishingConvictionNFT.initialize` revert-cases updated for
  the new dependency-resolution order
  (Token → ConvictionStakingStorage → EpochStorageV8 → Chronos).

`merkleLeafCount` propagation
- All `createKnowledgeCollection` call sites pass the new field.
- `KnowledgeAssetsV10.test` + `RandomSampling.test` updated to use the
  9-field digest signature, `merkleLeafCount`-aware fixtures, and
  ZeroHash leaf in submitProof argument.
- `RandomSampling.test` version assertion bumped to v1.1.0.

ConvictionStakingStorage targeted unit suite
- New `ConvictionStakingStorage.test` covers Guardian-as-base
  (`tokenContract`, rescue), operator-fee balance set/inc/dec/get,
  withdrawal-request create/delete/get, and the `transferStake`
  outflow with permission checks — ensures the consolidation didn't
  leak around the V8 archive.

Tests skipped pending followup-2 (V8-API-coupled, will be rewritten
when the per-delegator KPI surface is V10-tokenId-keyed)
- `Ask.test` — relies on V8 stake mutation paths.
- `DKGStakingConvictionNFT-extra` — V8 delegator flows.
- `v10-conviction-extra` — uses removed V8 Staking helpers.
- `v10-conviction-nft-audit` — same.
Each `describe.skip` carries an inline note pointing at followup-2.

Made-with: Cursor
Surface for the RS prover added in the previous agent-side commit:

- `daemon/routes/status` adds `GET /api/random-sampling/status` —
  read-only snapshot (current challenge, last submitted score,
  enabled/disabled reason). Cheap; no chain calls.
- `daemon/lifecycle` calls the bind's start/stop hooks alongside the
  publisher and identity loops so the prover follows the daemon's
  lifecycle correctly across reload/shutdown.
- `cli` adds `dkg random-sampling status` for ops-side visibility
  without curl.
- `api-client` adds the matching client method.
- `config` adds RS-specific config keys (`tickIntervalMs`, WAL path
  override) with sensible defaults; opt-out via env for edge nodes
  that explicitly disable.
- `publisher-runner` exposes its merkle output to the prover so they
  share the same V10 leaf-count source.

Made-with: Cursor
- `devnet.sh` staking step rewritten to use the V10 path
  (`DKGStakingConvictionNFT.createConviction(uint72, uint96, uint40)`)
  instead of legacy `Staking.stake()`. The V8 path updated only the
  V8 archive and left `getNodeStakeV10 = 0`, which made
  `RandomSampling.calculateNodeScore` return 0 for the entire devnet
  network and made any local RS validation a false negative. Approves
  StakingV10 (now the TRAC-pull side via the NFT proxy) and uses the
  uint40 lockTier signature so the function selector matches the
  consolidated contract; old uint8 selector was silently
  reverting in `require(false)` with no error data.
- `scripts/devnet-test-random-sampling.sh` is a new E2E smoke for the
  full RS loop: starts the prover, polls the on-chain
  `RandomSamplingStorage.getNodeEpochProofPeriodScore`, and asserts a
  non-zero score on at least one core within the first proof period.
  Runnable as the devnet tests gate before each PR.
- `chain-analysis.ts` adds dual-source CSS-vs-StakingStorage diff
  reporting so post-cutover drift is visible during migration soak.
- `epoch-snapshot.ts` reads V10 stake from CSS for V10 epochs and
  falls back to V8 only for pre-cutover epochs.

Made-with: Cursor
…ing workspace

Lockfile delta only — picks up the new `packages/random-sampling`
workspace package + its transitive deps (no version bumps to
existing packages). Generated by `pnpm install` against the new
`packages/random-sampling/package.json`.

Made-with: Cursor
…tion (was V8 Staking.stake)

When a fresh node boots against a chain that has the V10 contract
suite without V8 `Staking` deployed, the agent's auto-stake step was
silently failing — exactly the same trap the consolidation PR fixes
elsewhere (devnet.sh, EVM adapter ACK gate, etc.). Two scenarios on
the new testnet (which is about to be reset to V10-only):

  1. V8 `Staking` not redeployed: `hub.getContractAddress("Staking")`
     returns 0x0, the cached `this.contracts.staking` is undefined,
     and `await this.contracts.staking!.stake(...)` crashes with NPE
     before reaching chain. Profile gets created but stake never
     lands → `nodeStakeV10 = 0` → `RandomSampling.calculateNodeScore`
     returns 0 forever (it reads `getNodeStakeV10` exclusively).

  2. V8 `Staking` redeployed alongside V10: stake goes into V8
     `StakingStorage`, V10 CSS stays empty → same zero-score outcome.

Mirroring the `scripts/devnet.sh` fix that landed in commit
`6d7f1c1c`: route the auto-stake through
`DKGStakingConvictionNFT.createConviction(identityId, amount,
lockTier)` instead. The NFT mints a V10 position, writes
`nodeStakeV10` in `ConvictionStakingStorage`, and pulls TRAC into the
V10 vault (CSS) via `StakingV10`. TRAC allowance is granted to
`StakingV10` (the actual `transferFrom` caller), NOT to the NFT —
the NFT is only the entry point and never custodies TRAC.

Surface change
- `ensureProfile` accepts `lockTier?: number` (default 1 — 1-month,
  cheapest non-zero multiplier; same default `scripts/devnet.sh` uses
  for its bootstrap). Updated on `ChainAdapter`, `EVMAdapter`,
  `MockAdapter`, `NoChainAdapter` to keep the signatures aligned.
- `MockAdapter` and `NoChainAdapter` accept the new option for type
  parity; the mock remains a pure in-memory identity allocator.

Test
- `no-chain-adapter-extra` adds an `ensureProfile(with lockTier)`
  rejection assertion so an accidental signature regression on either
  side gets caught at `pnpm test`.

Devnet smoke (clean → start 6 → devnet-test-random-sampling.sh)
re-run on this commit's HEAD: PASS, on-chain
`getNodeEpochProofPeriodScore` non-zero
(206758022818946494, ~0.21 in 18-decimal scale), full WAL trail
(challenge → extracted → built → submitted).

Made-with: Cursor
… removed nft.stake)

`EVMChainAdapter.stakeWithLock` still called the V8-era
`DKGStakingConvictionNFT.stake(identityId, amount, lockEpochs)`
method that was renamed to `createConviction(identityId, amount,
lockTier)` during the V10 NFT consolidation. Every call exploded
with `TypeError: nft.stake is not a function`, taking down the
3 `staking-conviction` tests. Same pattern as the `ensureProfile`
fix in commit cbde620, just for the test-helper / dev-API surface.

Two bugs in one method:
  1. Wrong contract method (renamed). Now calls `createConviction`.
  2. Wrong allowance target — was approving the NFT, but TRAC is
     pulled by `StakingV10` (the NFT is only the entry point and
     never custodies TRAC). Now approves `StakingV10`. Mirrors
     the pattern in `ensureProfile` and `scripts/devnet.sh`.

Also renames the `lockEpochs` param to `lockTier` everywhere
(`ChainAdapter`, `EVMAdapter`, `MockAdapter`, `MockAdapter`'s
internal `delegatorLocks` map) — the value has been a tier index
since the V10 widening from `uint8 → uint40`, not an epoch count.
The old name was actively misleading. No callsite changes needed:
the integer values 1/3/6/12 already worked as tier indices in the
existing tests.

Test
- `test/staking-conviction.test.ts` (3 cases): now pass under the
  V10 path.
  - `stakeWithLock stores lock and returns success`
  - `getDelegatorConvictionMultiplier returns value after stakeWithLock`
  - `stakeWithLock only extends, never shortens lock` (passes
    vacuously — V10 mints a NEW NFT per call and the address-keyed
    multiplier shim returns 1; the V8 "extend in place" semantic
    is gone, the test's invariant `m2 >= m1` still holds at 1 == 1)

Out of scope for this commit (pre-existing on main, both before
and after this fix): 15 failures in `abi-pinning.test.ts`,
`evm-e2e.test.ts`, `permanent-publishing.test.ts`,
`chain-lifecycle-extra.test.ts`, `enrich-evm-error-extra.test.ts`.
Two flavours: (a) ABI digest snapshot pins that intentionally
fire when contract surfaces drift — they need their pinned hashes
bumped now that V10 added merkleLeafCount; (b) EVM E2E suites
that need their Hardhat fixture refreshed for the consolidated
contract layout. Tracked for a separate cleanup PR.

Made-with: Cursor
…ation

Operator-facing procedure for resetting the testnet onto the V10-only
contract layout shipped in PR #357. Covers all four roles in order:

  Phase A — Maintainer release (tag + GH release + npm publish).
  Phase B — Contracts deploy + multisig batch
            (mark every non-Hub/Token contract `deployed:false` in
            the network deployments JSON, run hardhat-deploy, multisig
            executes the queued `Hub.setContractAddress` batch).
  Phase C — Per-node reset (stop daemon, wipe per-node chain-state-derived
            files — store.nq, publish-journal.*, random-sampling.wal —
            keep keystore so wallet/identity is preserved across reset,
            upgrade to v10.x.y, restart). Calls out exactly what goes
            wrong if you skip the wipe (gossip of orphaned merkle
            roots, idempotency-key collisions, stuck WAL challenges).
  Phase D — Smoke verification via devnet-test-random-sampling.sh
            against the live testnet. Pins the
            "non-zero on-chain score == consolidation works" signal.

Also documents the deliberate "no V8 vault drain" choice — on a true
reset there is no V8 TRAC to migrate, V8 contracts stay unregistered,
and the V10 stack starts empty. This is what makes the reset cheaper
than a stateful migration.

Cross-references the relevant codepaths (deploy helper, ensureProfile,
devnet scripts) so an operator who hits a snag has a single read path
from the runbook into the code.

Made-with: Cursor
The previous version of the runbook (committed cc0b90a) told operators
to manually pull a new release, rebuild, and run `./scripts/devnet.sh
stop && ./scripts/devnet.sh start`. Two errors in that:

  1. devnet.sh is the local Hardhat playground, NOT the testnet
     operator path. Confused the dev-loop tool with the production
     daemon control surface.

  2. The daemon HAS a built-in auto-update mechanism + supervised
     restart (packages/cli/src/daemon/auto-update.ts +
     daemon/lifecycle.ts:735-781 + cli.ts:163,210). It polls every
     30 min by default (npm version OR git commit on tracked branch),
     applies the update, exits with DAEMON_EXIT_CODE_RESTART, and
     the CLI parent supervisor respawns the daemon against the new
     code. Operators don't have to touch the code update themselves.

Corrected runbook reflects:

  - Phase A (maintainer): tag → release → operators auto-pick-up
    within 30 min.
  - Phase B (deployer + multisig): unchanged. Mark non-Hub/Token
    contracts deployed:false, hardhat-deploy, multisig executes the
    queued setContractAddress batch, finalizeMigrationBatch.
  - Phase C (operators): now ONLY the one-time per-node state wipe
    is manual (oxigraph/journal/WAL reference orphaned chain
    entities post-reset). Uses `dkg stop` / `dkg start` (the
    testnet daemon control), not `devnet.sh`. Calls out exactly
    why the wipe is still needed even with auto-update.
  - Phase D: smoke is a developer-side verification, not per-operator.
  - Followup section: tracks "make Phase C zero-touch via a
    network-config migration marker" as a separate concern.

Cross-references list now points at the actual auto-update
codepaths so a reviewer / future operator can verify the mechanism
end-to-end.

Made-with: Cursor
…tion

Trivial conflict in packages/cli/src/daemon/lifecycle.ts where both sides
added unrelated fields (random-sampling config from this branch, context-graph
subscription/membership stores from main) to the same agent config object.
Both kept.

Brings in 147 commits including openclaw chat-turn coordination, blue-green
slot fixes, dkg-memory integration, sharding-table sync improvements.

Made-with: Cursor
network/testnet.json overrides the daemon defaults — operators poll the
main branch tip every 5 min, not a release tag every 30 min. Update
Phase A so it reads "merge to main IS the trigger" and clarify that a
tag/npm publish is only needed for standalone-install operators (still
recommended on testnet because most operators run that mode).

Made-with: Cursor
…r shapes

The previous regex only matched Hardhat-shape `data="0x..."` (key="value",
quoted). Production traffic also surfaces revert data as:

  - Geth:           data: "0x..."          (key: value, JS-object form)
  - Geth no-quote:  data=0x...             (no quotes, no colon)
  - Infura/Alchemy: errorData="0x..."      (errorData= prefix variant)
  - JSON body:      "data":"0x..."         (provider error JSON-embedded)

All four cases were silently dropped on the floor by the existing regex,
which made decoder logs return raw `0x...` selectors that operators had
to manually decode. Fixes 4 RED tests in enrich-evm-error-extra.test.ts
that were marked PROD-BUG / CH-10.

Generalised the regex to `(?:^|[^a-zA-Z])(?:errorData|data)["':=\s]+(0x[0-9a-fA-F]+)`:
- leading non-letter ensures `errorData` doesn't match as `data`
- separator class `["':=\s]+` accepts every observed delimiter combo
- behaviour on the unknown-selector / non-Error guards is unchanged

Made-with: Cursor
The shared E2E harness was still using the V8 Staking.stake path, which
writes to V8 StakingStorage but leaves AskStorage.totalActiveStake at
zero (V10 ConvictionStakingStorage is what AskStorage reads from now,
post the consolidation in this PR). The downstream symptom was
getStakeWeightedAverageAsk() == 0 → getRequiredPublishTokenAmount() == 0
→ every E2E publish test reverting at the first toBeGreaterThan(0n)
assertion.

Mirrors the same V10 conversion already applied to:
  - packages/chain/src/evm-adapter.ts:ensureProfile (PR #357 commit cbde620)
  - packages/chain/src/evm-adapter.ts:stakeWithLock (PR #357 commit d211bc5)
  - scripts/devnet.sh

Single root cause; fixes 8 failing tests across evm-e2e (6),
permanent-publishing (1), and chain-lifecycle-extra (1).

Made-with: Cursor
…fCount

Three changes, all consequences of PR #357 adding merkleLeafCount
(uint256) to the V10 publish/update surface:

1. abi-pinning.test.ts — refresh the pinned digests for the three V10
   contracts whose function signatures now carry merkleLeafCount:
     - KnowledgeAssetsV10 (publishDirect / updateDirect inputs)
     - KnowledgeCollection (createKnowledgeCollection / updateKnowledgeCollection)
     - KnowledgeCollectionStorage (knock-on from the function changes;
       event ABIs unchanged — pinned by content sanity tests below)

2. evm-e2e.test.ts — V10 multi-validator publish test:
     - extend ACK digest types to include uint256 merkleLeafCount (9 fields)
     - pass merkleLeafCount in the createKnowledgeAssetsV10 params struct

3. chain-lifecycle-extra.test.ts — full V10 lifecycle test:
     - same ACK digest extension as above (publishOneKCV10 helper)
     - same merkleLeafCount in createKnowledgeAssetsV10 params
     - add newMerkleLeafCount to updateKnowledgeCollectionV10 call

Mirrors the canonical helper at
  packages/evm-module/test/helpers/v10-kc-helpers.ts:buildPublishAckDigest
which is the ground truth for the digest layout.

Fixes 5 of the previously-failing chain tests.

Made-with: Cursor
Adds a maintainer-controlled signal (`chainResetMarker` in
`network/<env>.json`) that turns testnet/mainnet chain resets from a
manual per-operator drill into a fully automatic flow.

Mechanism:
- New `packages/cli/src/daemon/chain-reset-wipe.ts` hook runs on daemon
  boot, BEFORE the agent opens its oxigraph store.
- Compares `network.chainResetMarker` against the value persisted under
  `<dataDir>/.network-state.json`.
- On mismatch (or first boot with marker present) wipes:
    store.nq, store.nq.tmp, random-sampling.wal, publish-journal.*
- Preserves: wallets.json (operator identity), auth.token, config.json,
  node-ui.db (dashboard state), files/ (uploaded files), auto-update
  markers.
- Idempotent on subsequent boots; no-op when network config has no
  marker (back-compat for networks that haven't opted in).

Why a separate marker (not networkId): the existing `networkId` is a
SHA256 of the bundled genesis TriG and changes only when the genesis
itself does — orders of magnitude rarer than chain redeploys. Reusing
it would either never trigger or trip the FATAL genesis-mismatch guard.

Wired into `lifecycle.ts` between `loadNetworkConfig()` and
`DKGAgent.create()` so the wipe completes before any chain-state file
is opened by the agent.

For the imminent V10 staking consolidation reset (PR #357),
`network/testnet.json` ships with
`chainResetMarker: "v10-rs-staking-consolidation-2026-04-30"`. On
first boot of the new release, every operator's daemon detects no
prior marker, runs the wipe (which is a no-op for fresh installs and
correct for existing operators about to face the reset), and persists
the marker. Future resets only need a marker bump.

`docs/TESTNET_RESET.md` updated:
- Phase A now mentions the marker bump as the trigger.
- Phase C documents the auto-wipe and keeps the manual escape hatch
  as a fallback for exotic-environment operators.
- Removed the "Followup" section since the followup is now in this PR.

8 unit tests in `packages/cli/test/chain-reset-wipe.test.ts` cover:
opt-in semantics, first boot with marker, steady-state, marker change,
subset wipe, idempotency, corrupt state file.

Made-with: Cursor
// If status is fresh but existing is from a previous period
// (rotation happened), we discard existing and force a rotation
// by calling `createChallenge` below.
const existingIsCurrent =
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: this treats a cached challenge as "current" whenever the read-only status view returns the same activeProofPeriodStartBlock. If that view is stale, a solved challenge from the previous period still matches here, line 176 returns already-solved, and the prover never calls createChallenge() to trigger the on-chain auto-rotation. That leaves the node stuck after its first solved period until some unrelated tx advances the contract state. Please use the challenge's own expiry data / current block height to detect staleness, or force a fresh createChallenge() once the cached challenge is solved instead of trusting the view snapshot.

Comment thread packages/chain/src/evm-adapter.ts Outdated
// Legacy V9→V10 bridge: no triple-level payload here — assume a single
// Merkle leaf unless the caller migrates to `publishDirect` with an
// explicit `merkleLeafCount` from `V10MerkleTree`.
merkleLeafCount: 1,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: hardcoding merkleLeafCount to 1 corrupts every bridged V9→V10 publish whose flat KC actually has more than one deduped leaf. RandomSampling now uses the stored leaf count to choose and verify chunkId, so those KCs become unprovable on-chain. Please thread the real leaf count through PublishParams/callers (or keep this bridge on the legacy contract until callers can supply it) instead of silently writing 1.

}

const removedFiles = performWipe(opts.dataDir, log);
saveState(opts.dataDir, opts.currentMarker);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: performWipe() and saveState() run outside any try/catch, so a permission error or transient filesystem failure will throw out of startup and stop the daemon entirely. The comments/runbook for this feature say wipe failures should be logged and boot should continue with stale state; this implementation does the opposite. Please catch these failures here and return a non-fatal result instead of crashing the node.


wipeFixed('store.nq');
wipeFixed('store.nq.tmp');
wipeFixed('random-sampling.wal');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: this only wipes the default dataDir/random-sampling.wal, but this PR also adds configurable randomSampling.walPath. Any operator who follows that config path keeps the real WAL across a chain reset, so the prover can come back with stale challenge state against a freshly redeployed chain. Please wipe the resolved runtime WAL path instead of assuming the default filename.

Branimir Rakic added 3 commits April 30, 2026 23:17
…circuiting on solved

Codex review on PR #357 found the prover could strand a node after its
first solved period: when no on-chain tx had advanced
`activeProofPeriodStartBlock`, the read-only `getActiveProofPeriodStatus`
view kept returning the same (stale) period start, the cached solved
challenge matched, and the short-circuit returned `already-solved` until
some unrelated tx rotated the contract state.

Fix: when both `existingIsCurrent` and `existing.solved` are true, peek
the actual chain block height and compare against
`existing.activeProofPeriodStartBlock + proofingPeriodDurationInBlocks`.
If we're past the on-chain boundary, fall through to `createChallenge`
(which calls `updateAndGetActiveProofPeriodStartBlock` and rotates the
period for us). Otherwise short-circuit as before — the cached solved
result is genuinely current.

Why not always force createChallenge when solved: the on-chain
`createChallenge` REVERTS with "already been solved" inside the same
period (RandomSampling.sol L191-200), so a naive always-call would
burn ticks and emit confusing reverts on every poll between solve
and period boundary.

Adapter capability gating: `getBlockNumber` is optional on `ChainAdapter`.
When absent (mock / test adapters), `isCachedSolvedStale` returns false
so the legacy short-circuit semantics are preserved — pinned by the
existing `prover.test.ts` "returns already-solved when ... solved is
true" test (mock has no getBlockNumber, still passes).

Made-with: Cursor
…extGraph bridge

Codex review on PR #357 found that the V9->V10 mirror in
`EVMChainAdapter.publishToContextGraph` was hardcoding `merkleLeafCount: 1`
when calling `createKnowledgeAssetsV10`. Since RandomSampling now uses
the stored `merkleLeafCount` to pick `chunkId` (V10 flat-KC Merkle leaf
index), every bridged KC whose tree had more than one leaf would become
unprovable on-chain — the prover would request a chunk past the tree's
leaf range.

Fix: thread the leaf count through `PublishToContextGraphParams.merkleLeafCount`
(now required, sourced from `V10MerkleTree.leafCount`) and refuse to mirror
when the caller didn't supply it. Hard-failing here is preferable to
silent corruption — `publishToContextGraph` has no production callers in
this repo today (only test references that check it exists on the
adapter interface), so no migration is required.

Made-with: Cursor
…domSampling.walPath

Codex review on PR #357 found two issues with the zero-touch chain reset
hook:

1. `performWipe()` and `saveState()` ran outside any try/catch. A
   permission error or transient FS failure would throw out of startup
   and stop the daemon entirely. The runbook (docs/TESTNET_RESET.md)
   says wipe failures should be logged and boot must continue with
   stale state — the implementation did the opposite.

2. The hook only wiped the default `dataDir/random-sampling.wal`, but
   the prover honours a configurable `randomSampling.walPath`. Operators
   following the config knob would keep a stale WAL across a chain
   reset, so the prover could come back with stale challenge state
   pointing at a freshly redeployed chain.

Fixes:
- Wrap `performWipe` / `saveState` (top-level) and per-file `rmSync`
  (inner) in try/catch. FS errors are logged with a `WARN: ...` line
  the operator can grep in journalctl; boot continues regardless.
- Add `randomSamplingWalPath?: string` option; lifecycle.ts threads
  `config.randomSampling?.walPath` through. When set, the hook wipes
  the resolved runtime path (which may live outside dataDir entirely);
  when empty, falls back to `dataDir/random-sampling.wal` (default).
- 5 new tests covering: custom WAL path inside dataDir, empty path
  fallback, absolute WAL path outside dataDir, FS denial of state
  persist (skipped when not reproducible, e.g. running as root), and
  graceful handling when target files don't exist.

Made-with: Cursor
);
}
try {
saveState(opts.dataDir, opts.currentMarker);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: saveState() still runs even when performWipe() logged per-file failures. That records the new marker and suppresses all future retries, so a transient EACCES/EBUSY on store.nq or the WAL can leave the node permanently booting on stale chain state. Only persist currentMarker after every required delete succeeded, or return a success flag from performWipe() and skip saving on partial failure.

const rsRole: 'core' | 'edge' = effectiveRole === 'core' ? 'core' : 'edge';
let rsIdentityId = 0n;
if (this.chain.chainId !== 'none' && rsRole === 'core') {
try { rsIdentityId = await this.chain.getIdentityId(); } catch { /* ignore */ }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: collapsing any getIdentityId() error to 0n disables the prover for the lifetime of this process. A transient RPC/startup failure now looks identical to "no identity yet", and because bind only happens once the node never retries until a manual restart. Please distinguish "lookup failed" from "identity is 0" and retry or defer binding instead of swallowing the error here.

'getKCContextGraphId',
] as const;
const missing = required.filter(
(m) => typeof (opts.chain as unknown as Record<string, unknown>)[m] !== 'function',
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Issue: this only checks for method presence, but EVMChainAdapter implements these methods even when RandomSampling/RandomSamplingStorage are not deployed. On such networks the bind returns enabled: true and the loop just throws every tick. Gate on an actual runtime capability/deployment probe here (or expose an explicit isRandomSamplingReady() on the adapter) so the prover stays disabled when the contracts are absent.

this.store = deps.store;
this.identityId = deps.identityId;
this.builder = deps.builder ?? new InProcessProofBuilder();
this.wal = deps.wal ?? new InMemoryProverWal();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: the new WAL is append-only from the prover's point of view, but nothing in this PR reads it back before ticking. After a crash that happens after submitted but before the process observes the chain result, the next boot will build and send again instead of reconciling the pending period, which is exactly the double-submit/gas-loss case the WAL comments describe. Either replay latestFor/readAll on startup or remove the crash-recovery guarantee from this flow.

…ify Hub ownership path

- Network name is `base_sepolia_v10` (matches deployments JSON filename),
  not `base_sepolia` as the runbook had.
- Document the env vars the deploy command needs:
  RPC_BASE_SEPOLIA_V10 + EVM_PRIVATE_KEY_BASE_SEPOLIA_V10.
- Clarify Hub ownership: setAndReinitializeContracts is gated by
  onlyOwnerOrMultiSigOwner; deploy/998_initialize_contracts.ts calls
  it directly — works iff deployer is the Hub owner / a MultiSig
  owner. Document the manual fallback when it isn't (capture the
  emitted newContracts JSON, hand off for execution from the owner
  wallet / MultiSig UI).
- Note that the `deployed: false` snapshot edit is a scratch step
  (rewritten by 999_save_deployments.ts on success) — only commit the
  rewritten file post-deploy, not the bumped-to-false intermediate.

Made-with: Cursor
// would zero-gate every legitimate V10 ACK signer (this exactly mirrors
// the on-chain `KnowledgeAssetsV10` ACK-signer gate, also rewired in
// v4.0.0). Falls back to V8 if CSS is not registered (older deploys).
const cs = await this.resolveContract('ConvictionStakingStorage');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: resolveContract('ConvictionStakingStorage') throws when CSS is not Hub-registered, so this never reaches the V8 fallback below even though older deployments are supposed to keep using StakingStorage. On those hubs ACK verification now fails hard instead of returning a boolean. Probe the Hub address or wrap the CSS lookup in try/catch before falling back.

throw err;
}

await this.wal.append(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: by the time this submitted entry is written, chain.submitProof() has already waited for the receipt (EVMChainAdapter.submitProof calls tx.wait()). A crash after broadcast but before confirmation leaves no WAL breadcrumb, so the new crash-recovery path cannot dedupe/recover pending proofs and may rebroadcast on restart. This needs a pre-broadcast hook or split send/confirm flow, plus startup replay of submitted entries.

swmGraphId?: string,
subGraphName?: string,
/** V10 flat-KC Merkle leaf count (sorted + deduped); binds ACK + on-chain KC to RandomSampling. */
merkleLeafCount?: number,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bug: merkleLeafCount is now part of the ACK digest and on-chain KC shape, but this callback still treats it as optional. Existing custom v10ACKProviders will compile unchanged and, via the ?? 1 fallbacks in the agent/CLI factories, silently sign the wrong digest for any multi-leaf KC. Make this argument required and remove the default-to-1 path so callers fail fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant