feat(snapshots): major rewrite with CAS#50
Merged
Conversation
… bounded listing Sprint 1 tourniquet for RFC: Replace Opaque CRDT Snapshots with File-Level Recovery Manifests. Server changes: - Add stateVectorHash (SHA-256 of Y.encodeStateVector) and semanticHash (SHA-256 of sorted active paths + blob hashes) to SnapshotIndex - Daily snapshot creation now skips when semanticHash is unchanged, replacing the naive UTC-day-only dedup - Falls back to day-based dedup for legacy snapshots without semanticHash - Add latest-index.json pointer for O(1) latest snapshot lookup - Add retention policy: 7 daily, 4 weekly, 12 monthly, always keep latest and pinned - Add selectRetention() and applyRetention() with pruneSnapshots() - Opportunistic retention runs after each new snapshot creation - Add GET /snapshots/status endpoint (count, storage estimate, latest) - Add POST /snapshots/prune endpoint (manual cleanup command) - Bound snapshot listing with ?limit parameter (default 50, max 200) - Record vault traces for retention events Client changes: - Add stateVectorHash, semanticHash, pinned fields to client SnapshotIndex - Add requestPrune() and getSnapshotStatus() to snapshotClient - Add pruneSnapshots() command to SnapshotService - Snapshot listing now requests bounded results (?limit=50) - Add storage warning in SnapshotListModal when count > 30 or size > 50MB Tests: - Add tests/snapshot-retention.ts with unit tests for retention policy and semantic hash computation (24 assertions) - Existing snapshot tests continue to pass (33 assertions)
- Register 'snapshot-prune' command in command palette - Manual snapshot creation now checks if semanticHash matches previous and shows '(vault content unchanged since last snapshot)' notice - Completes Stage 1 acceptance criteria for manual snapshot dedup warning
Commits the full RFC as a living document. Phase 0 (tourniquet) is marked complete. Future phases reference this as the source of truth for design decisions, acceptance criteria, and release strategy.
semanticHash uses path:fileId pairs and does NOT detect content edits to existing files (fileId is stable across edits). stateVectorHash changes on ANY Yjs operation, making it the correct dedup gate. semanticHash is still computed and stored in snapshot indexes for future use by the CAS manifest system (file-level content dedup), but is not used for the skip/create decision.
Critical fixes: - Replace stateVectorHash dedup with fullUpdateHash (SHA-256 of Y.encodeStateAsUpdate). State vectors do NOT track Yjs deletions; fullUpdateHash includes the delete set and catches all changes. Test proves: SV unchanged after delete, fullUpdateHash changed. - Fix latest-index.json write ordering: payload + index written in Promise.all first, then latest pointer written sequentially after. Prevents poisoned pointer pointing to non-existent snapshot. - Add reason/pinned semantics to createSnapshot. Manual snapshots default pinned=true, daily snapshots default pinned=false. Pre-upgrade/pre-migration also default pinned. - Protect legacy snapshots: retention never auto-prunes snapshots without a 'reason' field (they may be old manual snapshots). Users can still prune them via explicit manual command. - Make listing/status honest: response includes totalIndexKeys, fetchedCount, limited flag, and 'LowerBound' suffixes on estimates. UI says 'at least N snapshots' when listing was capped. - Rename semanticHash -> structureHash (honest: it only tracks path:fileId structure, NOT file content). Legacy field preserved for backward compat reads. - Rename isoWeekKey -> roughWeekKey with documentation that it's an approximation, not ISO 8601 compliant. Known edge cases documented. - Retention is now awaited (not fire-and-forget). Errors are logged with per-snapshot detail in errors[] array. Prune endpoint also returns errors to trace store. - Manual snapshot warning says 'file structure unchanged' (not 'content unchanged') and notes content may still differ. Tests: 37 pass (was 24). New cases cover: - Delete-only transaction changes fullUpdateHash (the core Yjs bug) - Content edit with same fileId changes fullUpdateHash - structureHash honestly does not change on content edits - Manual pinned snapshot survives retention - Legacy snapshots without reason are conservatively kept - Year/month boundary retention correctness - Error surfacing in prune results - Backward compat with old snapshot indexes
1. Verify latest pointer target before dedup skip:
- verifySnapshotExists() does HEAD on crdt.bin.gz + index.json
- If either is missing, pointer is poisoned → create new snapshot
- Prevents infinite skip from a stale/corrupt latest pointer
2. Real R2 behavioral tests (miniflare, not mocks):
- test1: payload + index exist after createSnapshot
- test2: poisoned pointer detected, dedup does NOT skip
- test3: listing excludes latest-index.json
- test4: limited listing reports honest totalIndexKeys
- test5: precomputed update produces valid snapshot
- test6: pruneLegacy=false protects, pruneLegacy=true prunes
- test7: fullUpdateHash dedup with real R2
- test8: delete-only change not skipped
- test9: same-day snapshots sort correctly
3. Legacy prune honesty:
- applyRetention() and selectRetention() accept RetentionOptions
- { pruneLegacy: false } (default): legacy snapshots kept
- { pruneLegacy: true }: legacy snapshots eligible for pruning
- POST /snapshots/prune accepts { pruneLegacy: boolean } in body
- Standard retention (after daily snapshot) never prunes legacy
4. Manual unchanged uses fullUpdateHash:
- createSnapshotFromLiveDoc compares fullUpdateHash, not structureHash
- Response field renamed to snapshotIdenticalToLatest
- UI says 'identical to latest snapshot' — actually meaningful
- No longer warns about 'structure unchanged' after content edits
5. pinnedCount → pinnedCountLowerBound:
- Status endpoint field renamed to be honest about capped listing
6. Precomputed raw update (avoids double O(doc) encode):
- CreateSnapshotOptions accepts precomputedRawUpdate + precomputedFullUpdateHash
- Daily dedup path encodes once, passes to createSnapshot if creating
- Eliminates redundant Y.encodeStateAsUpdate for large vaults
Removed fake tests (assert(true) for write ordering and poisoned pointer).
These are now properly tested with real R2 in server/tests/snapshot-r2.ts.
Test results: 34 retention + 33 snapshot + 37 R2 behavioral = 104 total, 0 failures.
1. verifySnapshotExists now validates consistency, not just existence:
- Reads target index.json and checks snapshotId + fullUpdateHash match
- Verifies payload size matches crdtSizeBytes
- Returns false on malformed JSON, mismatched IDs, or wrong payload size
2. pruneLegacy requires confirmation guard:
- POST /snapshots/prune with { pruneLegacy: true } now requires
{ confirmLegacyPrune: 'DELETE_LEGACY_SNAPSHOTS' }
- Returns 400 if pruneLegacy=true without the confirmation string
- Destructive backup cleanup should be ugly on purpose
3. R2 tests wired into regression runner:
- tests/snapshot-r2-runner.mjs wraps server/tests/snapshot-r2.ts
- All 3 snapshot suites registered in run-regressions.mjs
- Miniflare instances properly disposed for clean process exit
- 'npm run test:regressions --only snapshot' runs all 104 assertions
4. structureHash has explicit 'do not use for content dedup' warning
All suites: 34 + 33 + 37 = 104 assertions, 0 failures.
Ensures all four combinations work without breakage:
old plugin + old server ✓
old plugin + new server ✓
new plugin + old server ✓
new plugin + new server ✓
Server changes:
- GET /snapshots returns legacy { snapshots } by default
- GET /snapshots?format=v2 returns { snapshots, totalIndexKeys, fetchedCount, limited }
- GET /snapshots/status returns BOTH old aliases (snapshotCount, estimatedStorageBytes,
pinnedCount) AND new fields (snapshotCountLowerBound, etc.)
- POST /snapshots manual response includes semanticUnchanged legacy alias alongside
snapshotIdenticalToLatest
Client changes:
- listSnapshots() parses both array and { snapshots } responses
- getSnapshotStatus() falls back to old field names when new ones are absent
- snapshotService uses snapshotIdenticalToLatest ?? semanticUnchanged
Compatibility test matrix (21 assertions):
- Old client parses new server default list response
- New client parses old server { snapshots } response
- New client handles bare array edge case
- Status field fallbacks in both directions
- Manual snapshot unchanged field fallbacks in both directions
- Default GET /snapshots does not include v2-only fields
Total test count: 34 + 33 + 21 + 37 = 125 assertions, 0 failures.
Extract parsing logic from snapshotClient.ts into exported helpers: - normalizeSnapshotListResponse(raw) - normalizeSnapshotStatusResponse(raw) - normalizeSnapshotUnchanged(raw) snapshotClient.ts and snapshotService.ts now use these helpers. tests/snapshot-compat.ts imports and tests the ACTUAL exported functions, not simulated parsers that could drift from real implementation. 29 assertions, 0 failures.
…t-redesign # Conflicts: # server/src/snapshot.ts
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Phase 0 (Tourniquet): Snapshot Safety Fixes
Reduces immediate harm from the current snapshot system. Does NOT implement the full CAS redesign (that is Sprint 2+).
What this PR does
Dedup safety (the critical fix):
fullUpdateHash= SHA-256 ofY.encodeStateAsUpdate(ydoc).Poisoned pointer verification:
verifySnapshotExists()doesHEADon bothcrdt.bin.gzandindex.json.Write ordering (durability):
crdt.bin.gz) and index (index.json) are written inPromise.allfirst.latest-index.jsonpointer is written ONLY AFTER payload+index are durable.Reason/pin semantics:
SnapshotReason:"daily" | "manual" | "pre-upgrade" | "pre-migration" | "pre-bulk-operation"pinned: true. Daily defaultspinned: false.Legacy protection:
selectRetention()andapplyRetention()acceptRetentionOptions { pruneLegacy?: boolean }.pruneLegacy: false): legacy snapshots (noreasonfield) are NEVER auto-pruned.POST /snapshots/pruneaccepts{ pruneLegacy: true }for explicit legacy cleanup.Honest naming and messaging:
semanticHash→structureHash(it only tracks path:fileId structure, NOT content).isoWeekKey→roughWeekKey(it is NOT ISO 8601 compliant, documented).fullUpdateHashcomparison → says "identical to latest snapshot" (actually meaningful).Honest listing/status:
/snapshotsreturns{ snapshots, totalIndexKeys, fetchedCount, limited }./snapshots/statusreturns{ snapshotCountLowerBound, listedSnapshotCount, listingLimited, estimatedStorageBytesLowerBound, pinnedCountLowerBound }.LowerBound.Performance: avoid double-encoding:
CreateSnapshotOptionsacceptsprecomputedRawUpdateandprecomputedFullUpdateHash.Y.encodeStateAsUpdatecall for large vaults.Retention improvements:
.catch(() => {})).pruneSnapshotsreturns{ deleted, failed, errors: string[] }.What this PR does NOT do (deferred to Sprint 2+)
getSnapshotPayload()still does full listing by ID (needs day-aware route or by-id index)structureHashusespathToIdonly (does not consider v2 meta path model yet)Tests: 104 assertions, 0 failures
Real R2 tests (miniflare): 37 assertions
verifySnapshotExistsreturns false, dedup does NOT skipRetention/hash tests (unit): 34 assertions
Snapshot integration tests: 33 assertions