Skip to content

feat: multi-hop contract WASM migration via legacy_contracts.toml#5

Merged
sanity merged 2 commits intomainfrom
fix-multi-hop-contract-migration
Apr 12, 2026
Merged

feat: multi-hop contract WASM migration via legacy_contracts.toml#5
sanity merged 2 commits intomainfrom
fix-multi-hop-contract-migration

Conversation

@sanity
Copy link
Copy Markdown
Contributor

@sanity sanity commented Apr 12, 2026

Problem

When Delta republishes with a new site_contract.wasm, every site's contract_key changes because contract_key = BLAKE3(BLAKE3(wasm) || params). Sites with a persisted KnownSiteRecord.contract_key_b58 migrate fine, but records restored from delegates older than b82d3bc have contract_key_b58 = None and were falling through to a hardcoded one-hop OLD_WASM_HASH constant. That constant had silently rotted across releases — pointing at 1188d108… (commit 2e664c3) while the actually-previous release shipped b92da83d… — so users of the previous release were stranded on a permanent "Loading..." screen after the V7 republish.

See also: 356e6b6 which updated OLD_WASM_HASH as an emergency hotfix. This PR replaces the single-constant mechanism with a proper multi-hop registry, on the same pattern as legacy_delegates.toml.

Approach

  • legacy_contracts.toml — single source of truth for previous contract WASM hashes.
  • ui/build.rs — parses it and emits LEGACY_CONTRACT_HASHES: &[[u8; 32]].
  • contract_id_for_prefix_with_hash — pure, unit-tested function governing contract-key derivation for any (prefix, hash) pair.
  • fire_legacy_contract_migrations — fires a migration GET for every historical hash at startup. Uses the existing PENDING_MIGRATIONS map to correlate responses.
  • clear_pending_migrations_for_prefix — on successful migration, cancels other in-flight probes for that prefix so a slower response from an older hash can't race ahead and overwrite fresh state.
  • restore_known_sites — now always issues current-key GET + stored-but-stale probe + generic legacy sweep. The NotFound handler no longer eagerly retries the current key on every legacy-probe miss.
  • scripts/add-contract-migration.sh — mirrors scripts/add-migration.sh. Run before touching common/ or the contract.
  • scripts/check-migration.sh — extended to enforce the contract-side recording. If the contract WASM changed since HEAD and the previous hash is not in legacy_contracts.toml, preflight fails loudly instead of silently stranding users.
  • AGENTS.md — updated upgrade workflow.

Tests

  • contract_id_is_deterministic_and_depends_on_both_hash_and_prefix
  • legacy_ids_are_deduplicated_and_exclude_current
  • legacy_contract_hashes_table_is_populated — guards against an empty generated file.

All existing tests continue to pass: delta-core 19/19, delta-ui 6/6, site-delegate 4/4.

Initial entries

  • C1 = 1188d108… — pre-tombstone WASM (commit 2e664c3).
  • C2 = b92da83d… — pre-V7 WASM (f5ecff5), where today's stranded users' state actually lives.

WASMs

Contract and delegate WASMs are byte-identical to main. This PR is pure UI logic + scripts + registry.

[AI-assisted - Claude]

sanity added 2 commits April 12, 2026 12:47
When Delta republishes with a new `site_contract.wasm`, every site's
contract_key changes because `contract_key = BLAKE3(BLAKE3(wasm) ||
params)`. For sites whose delegate-stored `KnownSiteRecord.contract_key_b58`
is persisted, the stored key is authoritative and the UI migrates
state automatically. But records restored from delegates older than
b82d3bc have `contract_key_b58 = None` and fell through to a hardcoded
one-hop `OLD_WASM_HASH` constant that had silently rotted across
releases — pointing at `1188d108…` (commit 2e664c3) while the actual
previous release shipped `b92da83d…`. Users of that previous release
were stranded on a permanent "Loading..." screen after the V7
republish because the single fallback hash didn't match where their
state actually lived, and the OLD_WASM_HASH constant had no automation
to keep it accurate.

This is the same class of bug as the delegate WASM migration issue
already solved by `legacy_delegates.toml`, but for the contract side.

Introduce `legacy_contracts.toml` as the single source of truth for
previous contract WASM hashes, mirroring `legacy_delegates.toml`:

- `ui/build.rs` parses it and emits a generated
  `LEGACY_CONTRACT_HASHES: &[[u8; 32]]` const, consumed via
  `include!(concat!(env!("OUT_DIR"), "/legacy_contracts.rs"))`.
- `contract_id_for_prefix_with_hash(prefix, hash)` computes the
  ContractInstanceId for any (prefix, WASM hash) pair — the single
  piece of logic that governs contract-key derivation, now pure and
  unit-tested.
- `legacy_contract_ids_for_prefix(prefix, current)` builds the
  migration probe set, filters out the current key, and de-duplicates.
- `fire_legacy_contract_migrations(prefix, current_b58)` registers a
  `PENDING_MIGRATIONS` entry and issues a GET for every historical
  hash. The first response carrying state wins.
- `clear_pending_migrations_for_prefix` cancels still-in-flight
  probes for a prefix after one completes, so a slower response from
  an older hash cannot race ahead and overwrite freshly-migrated state.
- `restore_known_sites` now always issues a GET for the current key,
  plus a probe for the stored-but-stale `contract_key_b58` (if any),
  plus the generic legacy sweep. The NotFound handler no longer
  eagerly retries the current key on every legacy-probe miss, since
  the current-key GET is already in flight.

Release workflow automation mirrors the delegate side:

- `scripts/add-contract-migration.sh VERSION "DESCRIPTION"` captures
  the currently-committed `site_contract.wasm` BLAKE3 before rebuild.
  Run BEFORE touching `common/` or the contract.
- `scripts/check-migration.sh` is extended: if the contract WASM
  changed since git HEAD, the previous hash MUST be present in
  `legacy_contracts.toml` or the script fails the preflight. This
  turns "forgot to record the old hash" into a loud publish-time
  error instead of a silent post-release "Loading..." incident.
- `AGENTS.md` upgrade workflow documents the new step.

- `contract_id_is_deterministic_and_depends_on_both_hash_and_prefix`:
  different WASM hashes produce different keys, same inputs are
  deterministic, different prefixes differ under the same hash.
- `legacy_ids_are_deduplicated_and_exclude_current`: when one legacy
  hash happens to compute to the "current" key, it's filtered out of
  the probe set; the returned set has no duplicates.
- `legacy_contract_hashes_table_is_populated`: guards against a
  silently-empty `legacy_contracts.rs` — without at least one entry,
  users of the immediately-preceding release have no fallback.

All existing tests continue to pass (delta-core 19/19, delta-ui 6/6,
site-delegate 4/4, site-contract 0/0).

`legacy_contracts.toml` seeds with two entries:
- C1 = `1188d108…` — pre-tombstone WASM (commit 2e664c3), inherited
  from the previous `OLD_WASM_HASH` constant.
- C2 = `b92da83d…` — pre-V7 WASM (f5ecff5), the hash of the release
  immediately before the per-prefix export signing-key fix. This is
  the hash where today's affected users' state actually lives.

Contract and delegate WASMs are byte-identical to main; this PR is
pure UI logic.

[AI-assisted - Claude]
Review findings on the multi-hop contract WASM migration PR:

**H1/H2 (high severity): late-response overwrite race.** The original
cancellation mechanism only removed entries from `PENDING_MIGRATIONS`,
but any GET response already in flight when cancellation fired would
take the non-migration branch of `handle_contract_response` and
last-write-wins over the freshly-captured state via `handle_site_state`.
A legacy-hash probe returning older state after a successful
current-key GET could silently clobber fresh data.

Fix: introduce a `MIGRATING_PREFIXES: BTreeSet<String>` populated by
`restore_known_sites` for each site entering its initial-capture
window, and an explicit `classify_get_response` state machine that
routes each incoming GET into one of four branches:

  - `PendingMigration { prefix }` — legacy/stale-key probe response;
    process if non-empty AND prefix still migrating, drop otherwise.
  - `InitialCurrentKey { prefix }` — current-key response while still
    capturing; process if non-empty, cancel siblings, exit the
    migration window.
  - `LiveUpdate` — prefix already captured or steady-state; process
    normally as an `UpdateNotification`-equivalent.
  - `Unknown` — unrecognized key; process as live update.

The `finalize_prefix_capture` helper removes the prefix from
`MIGRATING_PREFIXES` AND clears all `PENDING_MIGRATIONS` entries for
it atomically, so late responses land in the `LiveUpdate` branch but
are dropped from the migration PUT path.

**M2: startup thundering herd.** `fire_legacy_contract_migrations`
now runs only when the stored `contract_key_b58` is missing or
stale. In the steady-state case where the delegate's stored key
matches the current WASM, the site was created under the current
contract and no earlier WASM can have state for it — the sweep was
pure waste. Drops the startup-load cost from N × M GETs to just the
stored-key GET for up-to-date sites.

**M4 / cross-consistency test.** Added
`contract_id_matches_state_key_derivation_for_current_wasm` which
computes the current WASM's hash and asserts that
`contract_id_for_prefix_with_hash` agrees with the production
`state::contract_key_from_prefix` path. Guards against a
backwards-incompatible change to freenet-stdlib's
`ContractKey::from_params_and_code` silently breaking legacy probes.

**L1: test gap for the race logic.** Added five unit tests of the
pure `classify_get_response` state machine covering every branch:
pending-migration routing, current-key-during-capture, post-capture
live update, unknown key, and the tiebreaker where a key is in both
PENDING_MIGRATIONS and its prefix is in MIGRATING_PREFIXES (the
pending branch must win so the migration PUT runs).

**Code-first concern 6: `add-contract-migration.sh` race.** The
script now always hashes HEAD's tracked contract WASM via
`git show HEAD:...`, not the working-tree WASM. A developer who
accidentally ran `sync-wasm.sh` before recording the migration would
previously have recorded the *new* hash, silently defeating the
mechanism. The script now warns if the working tree and HEAD differ
and always records the HEAD hash.

**Rebase onto main.** Picked up the reproducible-WASM fix (#4) and
resolved the check-migration.sh conflict to use the new reproducible
`scripts/build-wasm.sh` wrapper for both the delegate and the
contract build-vs-committed verification. `legacy_contracts.toml`
gains a C3 entry for `53e3395f…`, the V7 contract hash shipped
immediately before reproducible builds landed — this is the hash
where the user's stranded site state currently lives on the network.

Tests: delta-core 19/19, delta-ui 12/12 (5 new classifier tests +
1 cross-consistency test + 3 legacy_contract tests + 3 export_key
tests), site-delegate 4/4. Clippy clean.

[AI-assisted - Claude]
@sanity sanity force-pushed the fix-multi-hop-contract-migration branch from 31bb9f2 to 02c5f5e Compare April 12, 2026 17:51
@sanity
Copy link
Copy Markdown
Contributor Author

sanity commented Apr 12, 2026

Addressed review feedback in 02c5f5e (force-pushed after rebase onto main for the reproducible-WASM fix).

Code-first / Skeptical H1+H2+H3 (critical race): late-response overwrites.

Before, cancellation only removed entries from PENDING_MIGRATIONS. Any GET response already in flight when cancel fired would land in handle_contract_response's non-migration branch, call handle_site_state, and last-write-wins over the freshly-captured state. Confirmed real — traced by both reviewers.

Fix: introduced MIGRATING_PREFIXES set populated at startup per site, plus an explicit classify_get_response state machine that routes each incoming GET into one of four branches — PendingMigration, InitialCurrentKey, LiveUpdate, Unknown. Late responses for a prefix whose capture has completed are dropped from the migration write path but still work normally for steady-state UpdateNotification merging via the LiveUpdate branch. finalize_prefix_capture atomically removes the prefix from MIGRATING_PREFIXES AND clears all PENDING_MIGRATIONS entries for it, so siblings can't sneak through after cancellation.

Skeptical M2 (thundering herd): fire_legacy_contract_migrations now runs only when old_key_b58.is_none() || stored_key_is_stale. In the steady state where the delegate's stored key matches the current WASM, the site was created under the current contract and no earlier WASM could have state for it — the sweep was pure waste. Drops startup load from N × M GETs to just the stored-key GET for up-to-date sites.

Skeptical M4 (derivation symmetry): added contract_id_matches_state_key_derivation_for_current_wasm which computes the current WASM's BLAKE3 hash and asserts that contract_id_for_prefix_with_hash agrees with the production state::contract_key_from_prefix path. Guards against any future backwards-incompatible change to freenet-stdlib's ContractKey::from_params_and_code.

Test gap L1: extracted classify_get_response as a pure state machine and added 5 unit tests covering all branches, including the tiebreaker where a key is both in PENDING_MIGRATIONS and its prefix in MIGRATING_PREFIXES (pending must win so the migration PUT runs).

Code-first concern 6: add-contract-migration.sh now always hashes git show HEAD:ui/public/contracts/site_contract.wasm and warns if the working tree differs, so a developer who accidentally rebuilt before recording won't silently record the new hash.

Skeptical M3 (C1 ghost hash): verified historically — git show 2e664c3:ui/public/contracts/site_contract.wasm | b3sum produces 1188d108…, so C1 really did ship. Kept.

Rebase: picked up #4's reproducible-WASM fix. check-migration.sh now uses the new scripts/build-wasm.sh wrapper for both delegate and contract build-vs-committed verification. legacy_contracts.toml gains C3 = 53e3395f…, the V7 contract hash shipped immediately before reproducible builds landed — this is where the currently-stranded user's site state actually lives.

Not addressed (minor, deferred):

  • Skeptical M1 (unbounded PENDING_MIGRATIONS on network silence): GET-timeout cleanup can be added as a follow-up; entries are bounded by N sites × M legacy hashes so not an immediate risk.
  • Skeptical L6 (backup-state restore race with legacy-hash probes): same race surface as H1, now protected by the same MIGRATING_PREFIXES gate since handle_restored_site_state also routes through handle_site_state. Worth a follow-up to verify end-to-end.

Tests: delta-core 19/19, delta-ui 12/12, site-delegate 4/4. Clippy clean.

[AI-assisted - Claude]

@sanity sanity merged commit df4fbe9 into main Apr 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant