Skip to content

fix(mneme): auto-migrate legacy MD5 narrative files to SHA-256 paths (#128)#161

Open
tcconnally wants to merge 1 commit into
mainfrom
fix/128-mneme-md5-sha256-migration
Open

fix(mneme): auto-migrate legacy MD5 narrative files to SHA-256 paths (#128)#161
tcconnally wants to merge 1 commit into
mainfrom
fix/128-mneme-md5-sha256-migration

Conversation

@tcconnally
Copy link
Copy Markdown
Owner

Summary

Closes #128 — most complex of the v1.0.6 fixes (data migration + new CLI command + 6 regression tests).

Pre-1.0.3, Mnēmē derived per-workspace narrative file names from an MD5 hash of the canonicalized workspace path. v1.0.3 switched to SHA-256 without any migration path. On upgrade, every existing narrative file was silently orphaned: _mneme_path() returned a path that didn't exist, Mnēmē reported "No narrative found for this workspace", and started fresh. The old MD5 files sat on disk untouched — preserved, but unreachable through any documented command.

Fix

Lossless auto-migration on first access

_mneme_path() performs a one-shot in-place rename: if the SHA-256 path doesn't exist but the legacy MD5 path does, os.replace atomically renames it. Idempotent. If both paths exist (race or operator staging), SHA-256 wins and the legacy file is left untouched (no data destruction). If rename fails (cross-device, permission), both files are preserved and the caller creates a fresh narrative at the SHA-256 path (non-fatal degradation).

New CLI: perseus memory doctor

For operators who want to migrate all workspaces in one pass or audit the memory store explicitly. Flags: --migrate (perform renames), --json (machine-readable output).

New public helpers (importable from perseus.py)

  • _workspace_hash_legacy_md5(workspace) — reproduces pre-1.0.3 hash exactly
  • _mneme_doctor_scan(cfg) — classifies every *.md in store as sha256/legacy_md5/orphan/unknown
  • _mneme_doctor_migrate(cfg) — bulk rename with structured report

Files Changed (6)

  • src/perseus/mneme_narrative.py — legacy hash + auto-migrate + doctor helpers
  • src/perseus/agora.pycmd_memory_doctor handler + subcommand dispatch
  • src/perseus/cli.py — argparse registration of memory doctor
  • tests/test_mneme.py — 6 regression tests
  • CHANGELOG.md — v1.0.6 entry
  • perseus.py — rebuilt artifact

Tests

6 new regression tests in tests/test_mneme.py:

  • test_mneme_path_auto_migrates_legacy_md5_file — happy path
  • test_mneme_path_no_migration_when_sha256_already_exists — no overwrite
  • test_mneme_path_is_idempotent_after_migration — double-call safe
  • test_memory_doctor_scan_classifies_files — 4-way classification
  • test_memory_doctor_migrate_renames_legacy_files — bulk migrate + idempotent
  • test_memory_doctor_migrate_skips_when_destination_exists — no clobber

Test results

  • All 6 new regression tests pass
  • All 19 tests in test_mneme.py pass
  • CLI smoke-tested: perseus memory doctor --help works end-to-end

Migration Notes

No manual action required. Migration happens automatically on first access for any operation that calls _mneme_path() (essentially all memory operations: update, compact, show, status, query, federation).

Operators with many workspaces can opt to run perseus memory doctor --migrate once after upgrading to surface and fix every workspace in one pass — useful for ops automation or audit.

CHANGELOG Note

This PR's CHANGELOG entry adds a [1.0.6] — UNRELEASED block; PRs #159 and #160 also add 1.0.6 entries. The merger should reconcile (keep all bullets, single header).


Third of 12 PRs in the v1.0.6 milestone. Suggested next: #131 (memory compact hang — second-most complex remaining).

…128)

Pre-1.0.3, Mnēmē derived per-workspace narrative file names from an MD5 hash
of the canonicalized workspace path. v1.0.3 switched to SHA-256 without any
migration path. On upgrade, every existing narrative file on disk was
silently orphaned: `_mneme_path()` returned a path that didn't exist, Mnēmē
reported "No narrative found for this workspace", and started fresh. The
old MD5 files sat on disk untouched (preserved, but unreachable through any
documented command).

This patch makes the upgrade lossless and gives operators a manual recovery
tool for edge cases.

Changes:

src/perseus/mneme_narrative.py:
- New _workspace_hash_legacy_md5(): reproduces the pre-1.0.3 hash exactly.
  Uses hashlib.md5(canonical, usedforsecurity=False) so FIPS-mode Pythons
  don't reject it (it's a file-naming hash, not a security primitive).
  Falls back to no-kwarg call on Python < 3.9.
- _mneme_path() now performs a one-shot in-place migration: if the SHA-256
  path doesn't exist but the legacy MD5 path does, os.replace atomically
  renames it. Idempotent. If both paths exist (race or operator staging),
  SHA-256 wins and legacy file is left untouched. If the rename fails
  (cross-device, permission), both files are preserved and the caller
  creates a fresh narrative at the SHA-256 path (non-fatal).
- New _mneme_doctor_scan(): classifies every *.md in the memory store as
  sha256, legacy_md5, orphan (frontmatter workspace doesn't match
  filename), or unknown (non-hex stem). Returns a structured dict.
- New _mneme_doctor_migrate(): walks scan output and renames every
  legacy MD5 file. Returns a report of migrated/skipped/errors tuples.

src/perseus/agora.py:
- New cmd_memory_doctor handler. Plain-text or JSON output. Read-only
  scan by default; `--migrate` flag performs the renames.

src/perseus/cli.py:
- Register `perseus memory doctor` subcommand with `--migrate` and `--json`.

Tests (tests/test_mneme.py):
- test_mneme_path_auto_migrates_legacy_md5_file
- test_mneme_path_no_migration_when_sha256_already_exists
- test_mneme_path_is_idempotent_after_migration
- test_memory_doctor_scan_classifies_files (4 file types)
- test_memory_doctor_migrate_renames_legacy_files (idempotent check)
- test_memory_doctor_migrate_skips_when_destination_exists

All 6 new regression tests pass. All 19 mneme tests pass.
CLI help confirmed: `perseus memory doctor --help` works end-to-end.

Closes #128
Refs milestone v1.0.6
@tcconnally tcconnally added this to the v1.0.6 milestone Jun 3, 2026
@tcconnally tcconnally added bug Something isn't working mneme P1-high labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working mneme P1-high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: workspace hash changed from MD5 to SHA256 between versions, silently breaking @memory for existing users

2 participants