Skip to content

fix: make WASM builds reproducible across checkout paths#4

Merged
sanity merged 4 commits intomainfrom
fix-3
Apr 12, 2026
Merged

fix: make WASM builds reproducible across checkout paths#4
sanity merged 4 commits intomainfrom
fix-3

Conversation

@sanity
Copy link
Copy Markdown
Contributor

@sanity sanity commented Apr 12, 2026

Problem

The Delegate migration safety CI job fails on every PR because locally-built WASM differs from CI-built WASM at the same commit. Absolute path strings survive strip = true (panic messages, file!(), dep source refs in debug tables), and the repo path, $CARGO_HOME, and $RUSTUP_HOME all differ between a developer machine (/home/ian/...) and GitHub Actions (/home/runner/...), so the hashes can never match. The check is currently decorative.

Approach

Add scripts/build-wasm.sh, which sets RUSTFLAGS with three --remap-path-prefix rules that rewrite the repo root, $CARGO_HOME, and $RUSTUP_HOME to stable placeholders (/delta, /cargo-home, /rustup-home) before invoking cargo build --release --target wasm32-unknown-unknown. Every producer of site WASM now goes through this wrapper so they cannot drift:

  • scripts/sync-wasm.sh
  • cargo make check-migration (Makefile.toml)
  • CI Check contract builds (WASM) step
  • CI Verify delegate WASM matches source step
  • CI Verify contract WASM matches source step

Also rebuilds the committed WASMs with the new flags. Because the delegate WASM changes, adds a V8 entry to legacy_delegates.toml recording the pre-remap delegate key (8a78b8b5...) so existing users' signing keys migrate cleanly.

Testing

Verified reproducibility by cloning the repo to a second path and rebuilding:

$ cd /tmp/delta-repro-test && ./scripts/build-wasm.sh -p site-contract -p site-delegate
$ b3sum target/wasm32-unknown-unknown/release/site_*.wasm
f90262ce22d3f921b5a3d1cacb9f76c7a5aaddc611fe17ef0b90fed153464d58  site_contract.wasm
f0f2060d603f08a4ded94eecd7b8ce1e06a08b9c1e398f36f568c458e8068de1  site_delegate.wasm

Matches byte-for-byte the hashes built in /home/ian/code/freenet/delta-fix-3. The CI migration-check job is itself the regression test — it will now go green on this PR and every subsequent PR if the wrapper is used consistently.

Local cargo fmt --check, cargo clippy --all-targets -- -D warnings, and cargo test all pass.

Closes #3

[AI-assisted - Claude]

sanity and others added 4 commits April 12, 2026 12:35
The Delegate migration safety CI check was failing on every PR because
locally built WASM differed from CI-built WASM even at the same commit.
Absolute path strings survive `strip = true` (panic messages, file!(),
etc.), and the crate root, $CARGO_HOME, and $RUSTUP_HOME all differ
between a developer's machine and GitHub Actions, so the hashes could
never match.

Add scripts/build-wasm.sh, which sets RUSTFLAGS with --remap-path-prefix
rules for the repo root, $CARGO_HOME, and $RUSTUP_HOME so rustc emits
stable placeholders (/delta, /cargo-home, /rustup-home) regardless of
where the build runs. sync-wasm.sh, the CI contract-build step, the CI
migration-check job, and cargo make check-migration all call this
wrapper so they can't drift out of agreement.

Verified reproducibility: rebuilding from /tmp/delta-repro-test produces
byte-identical site_contract.wasm and site_delegate.wasm as building
from /home/ian/code/freenet/delta-fix-3.

Rebuilds the committed WASMs with the new flags, so the delegate key
changes; adds V8 migration entry for the pre-remap delegate
(8a78b8b5...) so existing users' signing keys migrate cleanly.

Closes #3

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses review feedback:
- scripts/check-migration.sh now uses build-wasm.sh (was a missed drift path)
- build-wasm.sh uses pwd -P and realpath for symlink/trailing-slash safety

Also temporarily uploads the CI-built WASM as an artifact so we can diff
it against the local build and identify the remaining nondeterminism
source (current remap pass is not yet sufficient).

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
With the rust-src component installed (common on dev machines), rustc
substitutes the real on-disk std source path for the virtual /rustc/
paths baked into prebuilt std. CI doesn't have rust-src, so std panic
file paths stayed as /rustc/<commit>/... there. Remap the local
rust-src dir to the same virtual path so the two environments match.

Verified: local site_delegate.wasm now hashes to b7179907... which
matches the CI build from the previous run.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sanity sanity merged commit 89370af into main Apr 12, 2026
3 checks passed
sanity added a commit that referenced this pull request Apr 12, 2026
Review findings on the multi-hop contract WASM migration PR:

**H1/H2 (high severity): late-response overwrite race.** The original
cancellation mechanism only removed entries from `PENDING_MIGRATIONS`,
but any GET response already in flight when cancellation fired would
take the non-migration branch of `handle_contract_response` and
last-write-wins over the freshly-captured state via `handle_site_state`.
A legacy-hash probe returning older state after a successful
current-key GET could silently clobber fresh data.

Fix: introduce a `MIGRATING_PREFIXES: BTreeSet<String>` populated by
`restore_known_sites` for each site entering its initial-capture
window, and an explicit `classify_get_response` state machine that
routes each incoming GET into one of four branches:

  - `PendingMigration { prefix }` — legacy/stale-key probe response;
    process if non-empty AND prefix still migrating, drop otherwise.
  - `InitialCurrentKey { prefix }` — current-key response while still
    capturing; process if non-empty, cancel siblings, exit the
    migration window.
  - `LiveUpdate` — prefix already captured or steady-state; process
    normally as an `UpdateNotification`-equivalent.
  - `Unknown` — unrecognized key; process as live update.

The `finalize_prefix_capture` helper removes the prefix from
`MIGRATING_PREFIXES` AND clears all `PENDING_MIGRATIONS` entries for
it atomically, so late responses land in the `LiveUpdate` branch but
are dropped from the migration PUT path.

**M2: startup thundering herd.** `fire_legacy_contract_migrations`
now runs only when the stored `contract_key_b58` is missing or
stale. In the steady-state case where the delegate's stored key
matches the current WASM, the site was created under the current
contract and no earlier WASM can have state for it — the sweep was
pure waste. Drops the startup-load cost from N × M GETs to just the
stored-key GET for up-to-date sites.

**M4 / cross-consistency test.** Added
`contract_id_matches_state_key_derivation_for_current_wasm` which
computes the current WASM's hash and asserts that
`contract_id_for_prefix_with_hash` agrees with the production
`state::contract_key_from_prefix` path. Guards against a
backwards-incompatible change to freenet-stdlib's
`ContractKey::from_params_and_code` silently breaking legacy probes.

**L1: test gap for the race logic.** Added five unit tests of the
pure `classify_get_response` state machine covering every branch:
pending-migration routing, current-key-during-capture, post-capture
live update, unknown key, and the tiebreaker where a key is in both
PENDING_MIGRATIONS and its prefix is in MIGRATING_PREFIXES (the
pending branch must win so the migration PUT runs).

**Code-first concern 6: `add-contract-migration.sh` race.** The
script now always hashes HEAD's tracked contract WASM via
`git show HEAD:...`, not the working-tree WASM. A developer who
accidentally ran `sync-wasm.sh` before recording the migration would
previously have recorded the *new* hash, silently defeating the
mechanism. The script now warns if the working tree and HEAD differ
and always records the HEAD hash.

**Rebase onto main.** Picked up the reproducible-WASM fix (#4) and
resolved the check-migration.sh conflict to use the new reproducible
`scripts/build-wasm.sh` wrapper for both the delegate and the
contract build-vs-committed verification. `legacy_contracts.toml`
gains a C3 entry for `53e3395f…`, the V7 contract hash shipped
immediately before reproducible builds landed — this is the hash
where the user's stranded site state currently lives on the network.

Tests: delta-core 19/19, delta-ui 12/12 (5 new classifier tests +
1 cross-consistency test + 3 legacy_contract tests + 3 export_key
tests), site-delegate 4/4. Clippy clean.

[AI-assisted - Claude]
sanity added a commit that referenced this pull request Apr 12, 2026
* feat: multi-hop contract WASM migration via legacy_contracts.toml

When Delta republishes with a new `site_contract.wasm`, every site's
contract_key changes because `contract_key = BLAKE3(BLAKE3(wasm) ||
params)`. For sites whose delegate-stored `KnownSiteRecord.contract_key_b58`
is persisted, the stored key is authoritative and the UI migrates
state automatically. But records restored from delegates older than
b82d3bc have `contract_key_b58 = None` and fell through to a hardcoded
one-hop `OLD_WASM_HASH` constant that had silently rotted across
releases — pointing at `1188d108…` (commit 2e664c3) while the actual
previous release shipped `b92da83d…`. Users of that previous release
were stranded on a permanent "Loading..." screen after the V7
republish because the single fallback hash didn't match where their
state actually lived, and the OLD_WASM_HASH constant had no automation
to keep it accurate.

This is the same class of bug as the delegate WASM migration issue
already solved by `legacy_delegates.toml`, but for the contract side.

Introduce `legacy_contracts.toml` as the single source of truth for
previous contract WASM hashes, mirroring `legacy_delegates.toml`:

- `ui/build.rs` parses it and emits a generated
  `LEGACY_CONTRACT_HASHES: &[[u8; 32]]` const, consumed via
  `include!(concat!(env!("OUT_DIR"), "/legacy_contracts.rs"))`.
- `contract_id_for_prefix_with_hash(prefix, hash)` computes the
  ContractInstanceId for any (prefix, WASM hash) pair — the single
  piece of logic that governs contract-key derivation, now pure and
  unit-tested.
- `legacy_contract_ids_for_prefix(prefix, current)` builds the
  migration probe set, filters out the current key, and de-duplicates.
- `fire_legacy_contract_migrations(prefix, current_b58)` registers a
  `PENDING_MIGRATIONS` entry and issues a GET for every historical
  hash. The first response carrying state wins.
- `clear_pending_migrations_for_prefix` cancels still-in-flight
  probes for a prefix after one completes, so a slower response from
  an older hash cannot race ahead and overwrite freshly-migrated state.
- `restore_known_sites` now always issues a GET for the current key,
  plus a probe for the stored-but-stale `contract_key_b58` (if any),
  plus the generic legacy sweep. The NotFound handler no longer
  eagerly retries the current key on every legacy-probe miss, since
  the current-key GET is already in flight.

Release workflow automation mirrors the delegate side:

- `scripts/add-contract-migration.sh VERSION "DESCRIPTION"` captures
  the currently-committed `site_contract.wasm` BLAKE3 before rebuild.
  Run BEFORE touching `common/` or the contract.
- `scripts/check-migration.sh` is extended: if the contract WASM
  changed since git HEAD, the previous hash MUST be present in
  `legacy_contracts.toml` or the script fails the preflight. This
  turns "forgot to record the old hash" into a loud publish-time
  error instead of a silent post-release "Loading..." incident.
- `AGENTS.md` upgrade workflow documents the new step.

- `contract_id_is_deterministic_and_depends_on_both_hash_and_prefix`:
  different WASM hashes produce different keys, same inputs are
  deterministic, different prefixes differ under the same hash.
- `legacy_ids_are_deduplicated_and_exclude_current`: when one legacy
  hash happens to compute to the "current" key, it's filtered out of
  the probe set; the returned set has no duplicates.
- `legacy_contract_hashes_table_is_populated`: guards against a
  silently-empty `legacy_contracts.rs` — without at least one entry,
  users of the immediately-preceding release have no fallback.

All existing tests continue to pass (delta-core 19/19, delta-ui 6/6,
site-delegate 4/4, site-contract 0/0).

`legacy_contracts.toml` seeds with two entries:
- C1 = `1188d108…` — pre-tombstone WASM (commit 2e664c3), inherited
  from the previous `OLD_WASM_HASH` constant.
- C2 = `b92da83d…` — pre-V7 WASM (f5ecff5), the hash of the release
  immediately before the per-prefix export signing-key fix. This is
  the hash where today's affected users' state actually lives.

Contract and delegate WASMs are byte-identical to main; this PR is
pure UI logic.

[AI-assisted - Claude]

* fix(migration): address review feedback — race, thundering herd, tests

Review findings on the multi-hop contract WASM migration PR:

**H1/H2 (high severity): late-response overwrite race.** The original
cancellation mechanism only removed entries from `PENDING_MIGRATIONS`,
but any GET response already in flight when cancellation fired would
take the non-migration branch of `handle_contract_response` and
last-write-wins over the freshly-captured state via `handle_site_state`.
A legacy-hash probe returning older state after a successful
current-key GET could silently clobber fresh data.

Fix: introduce a `MIGRATING_PREFIXES: BTreeSet<String>` populated by
`restore_known_sites` for each site entering its initial-capture
window, and an explicit `classify_get_response` state machine that
routes each incoming GET into one of four branches:

  - `PendingMigration { prefix }` — legacy/stale-key probe response;
    process if non-empty AND prefix still migrating, drop otherwise.
  - `InitialCurrentKey { prefix }` — current-key response while still
    capturing; process if non-empty, cancel siblings, exit the
    migration window.
  - `LiveUpdate` — prefix already captured or steady-state; process
    normally as an `UpdateNotification`-equivalent.
  - `Unknown` — unrecognized key; process as live update.

The `finalize_prefix_capture` helper removes the prefix from
`MIGRATING_PREFIXES` AND clears all `PENDING_MIGRATIONS` entries for
it atomically, so late responses land in the `LiveUpdate` branch but
are dropped from the migration PUT path.

**M2: startup thundering herd.** `fire_legacy_contract_migrations`
now runs only when the stored `contract_key_b58` is missing or
stale. In the steady-state case where the delegate's stored key
matches the current WASM, the site was created under the current
contract and no earlier WASM can have state for it — the sweep was
pure waste. Drops the startup-load cost from N × M GETs to just the
stored-key GET for up-to-date sites.

**M4 / cross-consistency test.** Added
`contract_id_matches_state_key_derivation_for_current_wasm` which
computes the current WASM's hash and asserts that
`contract_id_for_prefix_with_hash` agrees with the production
`state::contract_key_from_prefix` path. Guards against a
backwards-incompatible change to freenet-stdlib's
`ContractKey::from_params_and_code` silently breaking legacy probes.

**L1: test gap for the race logic.** Added five unit tests of the
pure `classify_get_response` state machine covering every branch:
pending-migration routing, current-key-during-capture, post-capture
live update, unknown key, and the tiebreaker where a key is in both
PENDING_MIGRATIONS and its prefix is in MIGRATING_PREFIXES (the
pending branch must win so the migration PUT runs).

**Code-first concern 6: `add-contract-migration.sh` race.** The
script now always hashes HEAD's tracked contract WASM via
`git show HEAD:...`, not the working-tree WASM. A developer who
accidentally ran `sync-wasm.sh` before recording the migration would
previously have recorded the *new* hash, silently defeating the
mechanism. The script now warns if the working tree and HEAD differ
and always records the HEAD hash.

**Rebase onto main.** Picked up the reproducible-WASM fix (#4) and
resolved the check-migration.sh conflict to use the new reproducible
`scripts/build-wasm.sh` wrapper for both the delegate and the
contract build-vs-committed verification. `legacy_contracts.toml`
gains a C3 entry for `53e3395f…`, the V7 contract hash shipped
immediately before reproducible builds landed — this is the hash
where the user's stranded site state currently lives on the network.

Tests: delta-core 19/19, delta-ui 12/12 (5 new classifier tests +
1 cross-consistency test + 3 legacy_contract tests + 3 export_key
tests), site-delegate 4/4. Clippy clean.

[AI-assisted - Claude]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CI: Delegate migration safety check fails on all PRs due to non-reproducible WASM builds

1 participant