Skip to content

fix: persist site removals as tombstones to block legacy resurrection#1

Merged
sanity merged 2 commits intomainfrom
fix-legacy-empty-known-sites-race
Apr 12, 2026
Merged

fix: persist site removals as tombstones to block legacy resurrection#1
sanity merged 2 commits intomainfrom
fix-legacy-empty-known-sites-race

Conversation

@sanity
Copy link
Copy Markdown
Contributor

@sanity sanity commented Apr 12, 2026

Problem

Deleted Delta sites re-appeared after a page refresh. User report from Ivvor via Matrix: "I'm still having trouble with Delta sites coming back, but I have a theory. Are they being restored from legacy delegate stores on refresh sometimes?"

The theory was correct. Two bugs in ui/src/freenet_api/delegate.rs caused this:

  1. Race. register_delegate fired load_known_sites() and fire_legacy_migration() back-to-back. If a legacy delegate's KnownSites response arrived before the current delegate's, the CURRENT_SITES_LOADED guard was still false and legacy records restored the deleted sites.

  2. Empty-list gap. CURRENT_SITES_LOADED only flipped on a non-empty current-delegate response. If the user had deleted all their sites, the current delegate returned an empty list, the flag stayed off, and any legacy response arriving later would unconditionally restore the deleted sites.

Both bugs were amplified by the more fundamental problem that REMOVED_PREFIXES was session-only: even when restoration was blocked within a session, a refresh cleared the set and the next load's legacy response would resurrect the sites again.

Approach

Persist removals as tombstone KnownSiteRecord entries. When the user removes a site, save_known_sites now includes a tombstone alongside the real records. Tombstones use a NUL-prefixed sentinel in the name field ("\0__delta_removed__"), which cannot collide with any user-supplied site name (UI input strips NULs).

This requires zero schema changes and no delegate WASM rebuild — the delegate just stores and returns the Vec<KnownSiteRecord> as-is; the UI interprets sentinel entries. check-migration confirms the delegate WASM hash is unchanged because the new const and helpers are dead-code-eliminated from the delegate build.

On load, the KnownSites response handler now:

  • Partitions the response into real records and tombstones.
  • Seeds REMOVED_PREFIXES from tombstones (for both current and legacy responses — a legacy delegate may still hold a removal recorded before a delegate upgrade).
  • Flips CURRENT_SITES_LOADED whenever the current delegate holds any state (real records or tombstones), so "deleted everything" is treated as authoritative.
  • Leaves CURRENT_SITES_LOADED off when the current delegate returns nothing at all, so legitimate first-time legacy migration still works.

Finally, fire_legacy_migration() is deferred until the current delegate's KnownSites response is handled, eliminating the race entirely.

restore_known_sites and the network GET handler already filter by REMOVED_PREFIXES, so no changes were needed there.

Testing

  • Unit test known_site_record_tombstone_roundtrip in delta-core covers the sentinel through a ciborium encode/decode roundtrip, with:
    • Real records: is_tombstone() == false
    • Tombstone records: is_tombstone() == true
    • Negative case: a record whose name happens to start with NUL but differs ("\0not_removed") is NOT treated as a tombstone.
  • cargo fmt, cargo clippy --all-targets -- -D warnings, and cargo test --workspace all pass.
  • cargo make preflight passes.
  • cargo make check-migration confirms the delegate WASM hash is unchanged — no migration entry needed.

Migration notes for existing users

The first refresh after this fix ships will still see any "ghost" sites from the pre-fix buggy state because no tombstones exist yet. Removing them once more will now persist, and subsequent refreshes will not resurrect them.

[AI-assisted - Claude]

sanity and others added 2 commits April 12, 2026 08:46
Removed sites were re-appearing after a page refresh. Root cause: the
session-only REMOVED_PREFIXES set was lost on refresh, so legacy
delegates (held over from previous Delta versions) could return the
old KnownSites list and restore_known_sites would re-add the deleted
prefixes. Two bugs fed this:

1. Race: register_delegate fired load_known_sites() and
   fire_legacy_migration() back-to-back. If a legacy delegate's
   KnownSites response arrived before the current delegate's, the
   CURRENT_SITES_LOADED guard was still false and legacy records
   restored.

2. Empty-list gap: CURRENT_SITES_LOADED only flipped on a NON-empty
   current response. When a user had deleted all of their sites, the
   current delegate returned an empty list, the flag stayed off, and
   any legacy response arriving later would unconditionally restore
   the deleted sites.

Even without the race, deletions could not survive a refresh because
REMOVED_PREFIXES was never persisted.

Fix:
- Persist REMOVED_PREFIXES via tombstone KnownSiteRecord entries.
  Tombstones use a NUL-prefixed sentinel in the name field, which
  cannot collide with any user-supplied site name. This requires no
  schema change and no delegate WASM rebuild — the delegate just
  stores and returns the Vec<KnownSiteRecord> as-is.
- On load, partition the response into real records and tombstones,
  and seed REMOVED_PREFIXES from the tombstones before calling
  restore_known_sites (which already filters by REMOVED_PREFIXES).
- Flip CURRENT_SITES_LOADED whenever the current delegate holds ANY
  state (real records OR tombstones), so "deleted everything" is
  treated as authoritative.
- Defer fire_legacy_migration() until the current delegate's
  KnownSites response is handled, eliminating the race entirely.
- Preserve first-time legacy migration: if the current delegate has
  neither real records nor tombstones, it is pre-migration and
  legacy KnownSites are still accepted.

Added a serde roundtrip test in delta-core covering the tombstone
sentinel through ciborium encode/decode, including a negative case
where a NUL-prefixed but non-matching name is NOT treated as a
tombstone.

Delegate WASM hash unchanged — the new sentinel and helpers are
dead-code-eliminated from the delegate build — so no migration entry
is required.

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses review findings on the tombstone persistence approach:

1. **Silent re-add failure (skeptical #2, codex P2)**: after persisting a
   tombstone, any attempt to re-add the same prefix via create_new_site,
   import_site_key, or visit_site was silently filtered by
   restore_known_sites. Added clear_tombstone(prefix) helper in state.rs
   and call it from all three entry points before inserting into SITES.

2. **Belt-and-braces save guard**: save_known_sites now skips tombstones
   whose prefix is currently live in SITES, so an add/remove race or a
   missed clear_tombstone can never persist a contradicting tombstone.

3. **Live site not removed by late tombstone (skeptical #8)**: if a
   tombstone arrives for a prefix that's already in SITES (e.g. added
   via hash route before known_sites loaded), we now drop the live
   entry too.

4. **Legacy tombstone-only responses (skeptical #3)**: if a legacy
   delegate returns only tombstones and no real records, save_known_sites
   is now triggered anyway so the tombstones are persisted to the
   current delegate and survive the next refresh.

5. **Defensive debug_assert**: restore_known_sites now debug_asserts that
   records are not tombstones and skips them in release, catching future
   callers that bypass the partition in handle_delegate_response.

6. **Docs**: AGENTS.md now documents the tombstone sentinel convention
   and the invariant that every KnownSites consumer must filter via
   is_tombstone().

Delegate WASM hash unchanged (verified via cargo make check-migration).

[AI-assisted - Claude]

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sanity
Copy link
Copy Markdown
Contributor Author

sanity commented Apr 12, 2026

Review feedback addressed

Four internal review agents + Codex completed. Critical findings and how they were addressed in 30ade2e:

Skeptical #2 / Codex P2 — silent re-add failure: if a user removed a site and later re-created / visited / imported the same prefix, restore_known_sites would filter it out because REMOVED_PREFIXES was repopulated from the persisted tombstone. Fixed by adding state::clear_tombstone(prefix) and calling it from create_new_site, import_site_key, and visit_site before inserting into SITES. Defensively, save_known_sites now also skips tombstones for prefixes currently live in SITES.

Skeptical #8 — live site not removed by late tombstone: if a tombstone arrived for a prefix already in SITES (e.g. added via hash route before known_sites loaded), only REMOVED_PREFIXES was updated but the live entry stayed. Fixed — tombstones now drop the matching SITES entry too.

Skeptical #3 — legacy tombstone-only responses not persisted: if a legacy delegate returned only tombstones and no real records, save_known_sites() wasn't called, so the tombstones leaked out of REMOVED_PREFIXES on next refresh. Fixed — save now triggers if legacy contributed any state, real or tombstone.

Big-picture — docs/invariant: AGENTS.md now documents the tombstone sentinel convention and the invariant that every KnownSites consumer must filter via is_tombstone(). Also added a debug_assert! in restore_known_sites that fails loudly if a tombstone slips through the partition.

Code-first / big-picture — debug_assert: added.

Not addressed (deferred)

  • Unbounded tombstone growth (skeptical fix: persist site removals as tombstones to block legacy resurrection #1 / code-first): REMOVED_PREFIXES is append-only and persisted forever. Real concern at scale but no correctness issue today. Will file a follow-up issue for TTL / cap strategy once we see how users actually churn sites.
  • Testing: pure-logic refactor of handle_delegate_response (testing reviewer critical): valid suggestion — the state-machine logic is all pure Rust and could be extracted into (records, is_legacy, flags) -> Action for native unit tests. Deferring to a follow-up because it touches more surface area than this bug fix should. Filing as an issue.
  • Skeptical feat: multi-hop contract WASM migration via legacy_contracts.toml #5/6 — LEGACY_MIGRATION_FIRED set before firing, no reset: minor; fire_legacy_migration has no failure modes that would warrant retry.
  • Skeptical Sites vanishing on upgrade #9 — malicious legacy delegate DoS via crafted tombstones: real but low-priority; legacy delegates are user-local (stored in their own node), so "malicious legacy delegate" would require the user to have previously installed a malicious Delta build. Out of scope for this fix.

CI should run shortly. Publishing to Freenet after CI passes.

[AI-assisted - Claude]

@sanity
Copy link
Copy Markdown
Contributor Author

sanity commented Apr 12, 2026

CI status

Two checks:

  • Check, lint, and test: passing
  • Delegate migration safety: FAILING — but this is a pre-existing issue on main, not something this PR introduced.

Every recent commit on main has red on this job (ad0fc2d, 16d24f8, 00df879, 19fee64 — all fail). Root cause: CI builds the delegate WASM from source and compares its hash against the committed binary, but CI's build environment produces a different hash than local builds for byte-identical source. A clean local rebuild on my machine reproduces the committed hash (eb5396ef3408...) exactly; CI produces 73af51ec1a17.... Likely rustc toolchain / linker nondeterminism.

This PR does NOT change the delegate WASM (verified via cargo make check-migration locally). The new sentinel const and helpers are dead-code-eliminated from the delegate build. No migration entry is required and none was added.

The CI job itself needs fixing as a separate issue — probably by pinning the Rust toolchain to a specific version in rust-toolchain.toml and matching that in CI, or by trusting cargo make check-migration locally as the source of truth.

Already published to Freenet successfully:

Contract published successfully, response_key: EqJ5YpEEV3XLqEvKWLQHFhGAac2qXzSUoE6k2zbdnXBr

Merging despite the pre-existing red check since the new code is in flight via publish and the test job is green.

[AI-assisted - Claude]

@sanity sanity merged commit 47d2b4a into main Apr 12, 2026
2 of 3 checks passed
@sanity sanity deleted the fix-legacy-empty-known-sites-race branch April 12, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant