Skip to content

feat: store-migration endpoint (files granularity) — closes #8#9

Merged
OriNachum merged 4 commits into
mainfrom
feat/store-migration-endpoint-issue-8
Jun 21, 2026
Merged

feat: store-migration endpoint (files granularity) — closes #8#9
OriNachum merged 4 commits into
mainfrom
feat/store-migration-endpoint-issue-8

Conversation

@OriNachum

Copy link
Copy Markdown
Contributor

What this does

Closes #8 — the store-migration endpoint, the first slice of Wave 3. A consumer (eidetic-cli first) upgrades a populated on-disk store to the current Envelope format by supplying only a transform — and never constructing a filesystem write path itself. The rewrite, and its path-construction concern, now live behind data-refinery's boundary (the component that owns the store layout). This is what lets eidetic delete its path-constructing migrate_store.py and clears its SonarCloud pythonsecurity:S2083 BLOCKER without rule suppression.

Surface

  • Importable: data_refinery.store.migrate(transform=None, *, backend="files", base_dir=None, dry_run=False) — the consumer passes a callable dict -> Envelope | None (return None to drop a record) and optionally a store root it already owns.
  • CLI: data-refinery store migrate [--backend files|mongo|neo4j] [--dry-run] [--json] — self-canonicalises data-refinery's own JSONL (no Python callable crosses argv), resolving the store dir from DR_DATA_DIR.
  • Files granularity only today; mongo (vectors) / neo4j (graph) raise a structured CliError — the files-first seam, in the order requested.

Load-bearing guarantees

  • Consumer never constructs a write path — data-refinery resolves the root, walks it, and owns every write.
  • Atomic per file — shared _atomic_write (temp sibling + os.replace); a crash leaves each file fully-old or fully-new, never partial. (This also hardened the day-to-day upsert/delete path.)
  • Idempotent — a 2nd run is byte-identical (migrated: 0); already-canonical lines are kept verbatim, so even a non-idempotent transform is applied exactly once. Rests on the Envelope round-trip being a stable fixpoint (test-guarded).
  • Whole-store pre-flight validation — every scope file is transformed + validated before any write, so a corrupt line / invalid transform output / symlink escape in any file aborts the whole migration before it touches disk.
  • Scope no-leak survives migration — a private record is never served to a public fetch; an unknown visibility aborts before any write (fails closed).
  • No traceback, ever — failures are CliError with a hint: and a documented exit code.

Verification

Three independent passes, all green:

  1. Live test — drove the CLI + importable seam end-to-end (dry-run, migrate, idempotent re-run, no-leak, crash-safety via monkeypatched os.replace, validation-abort).
  2. Colleague review of the diff (a diverse second mind) — confirmed atomicity / path-safety / scope-no-leak / files-first seam sound; its findings (whole-store pre-flight hardening + a docstring clarification) are folded into this PR.
  3. Colleague hands-on live-test — independently operated the real CLI against a throwaway store with sha256 before/after checks; all six guarantees pass, no bugs.

Gate: 157 tests pass (+4 skipped live-stack), agent-first rubric 26/26, black/isort/flake8/bandit/markdownlint clean, coverage migrate.py 100% / files.py 98%. Version 0.6.0.

Not in this PR (rest of Wave 3)

Freezing the full pinnable verb-JSON contract + eidetic consuming the surface over the subprocess boundary (eidetic drops/thins neo4j+pymongo and replaces eidetic migrate store with a thin call into store.migrate). Tracked in #1 / #3 / #8.

  • data-refinery-cli (Claude)

🤖 Generated with Claude Code

https://claude.ai/code/session_01AyLu1fvaV2Ys1ZQmUUeGAc

OriNachum and others added 3 commits June 21, 2026 18:12
Expose a files/migration endpoint so a consumer upgrades a populated on-disk
store to the current Envelope format WITHOUT constructing any filesystem write
path itself — moving the path-construction concern (and eidetic's S2083 BLOCKER
sink) to the component that owns the storage layout.

- data_refinery/store/migrate.py: public migrate(transform=None, *, backend,
  base_dir, dry_run). Consumer supplies only a transform (each decoded legacy
  line -> Envelope, or None to drop); data-refinery resolves the root and owns
  the atomic per-file rewrite. transform=None self-canonicalises DR's own
  Envelope-JSONL. Files-only today; mongo/neo4j raise a structured CliError.
- FilesBackend.migrate + shared _atomic_write (temp sibling + os.replace), which
  also hardens the day-to-day upsert/delete against truncate-on-crash. Symlink
  escape from the canonical root is refused (code 2). Idempotent: a file whose
  canonical re-serialisation equals its bytes is left untouched (byte-identical
  2nd run). Already-canonical lines are kept verbatim so a re-run never
  re-applies the consumer transform.
- CLI `data-refinery store migrate` (--backend/--dry-run/--json, no --data-dir):
  self-canonicalise only (no callable crosses argv); resolves root from
  DR_DATA_DIR, keeping the write sink off DR's own Sonar taint path.
- Docs: contract v3 (+ importable migrate + invariants), explain catalog,
  store overview, README, CLAUDE.md / AGENTS.colleague.md Wave-3 status.
- Tests: idempotency, atomic/abort-safety, transform path, self-canonicalise,
  validate-before-write, symlink refusal, dry-run, files-first seam, CLI verb.
- devague spec under docs/specs/ (the converged /think frame).

Bump 0.5.2 -> 0.6.0 (new verb). Local gate green: 169 tests, rubric 26/26,
black/isort/flake8/bandit/markdownlint clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AyLu1fvaV2Ys1ZQmUUeGAc
…able commonpath guard

Lift FilesBackend coverage 95% -> 97% on the migration path:
- blank/whitespace lines in a scope file are skipped (not preserved)
- an orphan *.jsonl.tmp from a prior interrupted run is reaped on the next migrate
- mark the commonpath ValueError branch (Windows different-drives / mixed roots)
  # pragma: no cover — two absolute resolved POSIX paths never raise it

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AyLu1fvaV2Ys1ZQmUUeGAc
…old-in)

Folds the substantive findings from a colleague review + hands-on live-test of
the issue-#8 store-migration endpoint. The review confirmed atomicity,
path-safety, scope-no-leak, and the files-first seam are sound, and its one
"idempotency bug" was a false positive (it misread which dict is on disk on the
2nd run); the hands-on live-test passed all six guarantees with no bugs.

Real improvements taken:
- migrate() is now two-pass: transform + validate EVERY scope file before
  writing one byte, so a corrupt line / invalid transform output / symlink
  escape in any file aborts the whole migration before it touches disk
  (whole-store abort-safety, strictly stronger than the prior per-file abort).
- reap orphan *.jsonl.tmp at the START of a run (clear a prior crash's debris
  before planning), not after.
- sharpen the idempotency docstrings (migrate.py / _to_envelope): the consumer's
  transform need not be idempotent — already-canonical lines are kept verbatim,
  resting on the Envelope round-trip being a stable fixpoint (the thing the
  review misread).

Tests (+3, 22 total in the migrate suite; 157 pass overall):
- test_envelope_round_trip_is_a_fixpoint — guards the foundation of the contract
- test_non_idempotent_transform_is_applied_exactly_once — a marker-stamping
  transform runs once; the 2nd run keeps the canonical line verbatim
- test_whole_store_validation_aborts_before_any_write — a corrupt 2nd file
  leaves the (sorted-first) first file untouched

Gate: rubric 26/26, black/isort/flake8/bandit/markdownlint clean,
files.py 98% / migrate.py 100%. Version stays 0.6.0 (unreleased); CHANGELOG
entry amended.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AyLu1fvaV2Ys1ZQmUUeGAc
@OriNachum

Copy link
Copy Markdown
Contributor Author

/agentic_review

@qodo-code-review

qodo-code-review Bot commented Jun 21, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (0) 📎 Requirement gaps (0) 📜 Skill insights (0)

Context used
✅ Compliance rules (platform): 15 rules

Grey Divider


Action required

1. _atomic_write re-raises OSError ✓ Resolved 📘 Rule violation ≡ Correctness
Description
FilesBackend migration/writes can raise raw OSError (e.g., unreadable scope file or failed
os.replace) instead of CliError, violating the exit-code/error-contract and causing the
top-level CLI to wrap it as an EXIT_USER_ERROR (1) with a generic "unexpected" message. This
breaks the requirement that environment/setup failures surface as structured CliError with exit
code 2 and actionable remediation.
Code

data_refinery/store/backends/files.py[R239-257]

+    def _atomic_write(self, path: Path, text: str) -> None:
+        """Write *text* to *path* atomically (temp sibling + ``os.replace``).
+
+        The temp is a sibling in the same directory, so ``os.replace`` is a
+        same-filesystem atomic rename: a crash leaves either the old file or the
+        new one — never a half-written file. Shared by ``upsert``/``delete`` and
+        the migration rewrite, so every write to a scope file is durable.
+        """
        path.parent.mkdir(parents=True, exist_ok=True)
-        with open(path, "w", encoding="utf-8") as f:
-            for r in records:
-                f.write(json.dumps(r.to_dict()) + "\n")
+        tmp = path.with_name(path.name + _TMP_SUFFIX)
+        try:
+            tmp.write_text(text, encoding="utf-8")
+            os.replace(tmp, path)
+        except OSError:
+            try:
+                tmp.unlink()
+            except OSError:  # pragma: no cover - best effort cleanup
+                pass
+            raise
Evidence
FilesBackend.migrate() reads each *.jsonl via path.read_text(...) without catching OSError,
and _atomic_write() explicitly catches OSError only to delete the temp file and then re-raises
the raw exception. The CLI dispatcher wraps any non-CliError exception into a new CliError with
EXIT_USER_ERROR (1), which violates the checklist requirement to use CliError consistently and
to use exit code 2 for environment/setup failures.

Rule 1077531: Standardize CLI exit codes and use CliError for all failures
data_refinery/store/backends/files.py[124-138]
data_refinery/store/backends/files.py[239-257]
data_refinery/cli/init.py[104-124]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`FilesBackend` migration and atomic writes can throw raw `OSError` (and `path.read_text(...)` can also raise `OSError`) rather than raising `CliError`. In CLI execution this gets wrapped as an `EXIT_USER_ERROR` (code 1) "unexpected" error, instead of a structured environment/setup error with exit code 2.

## Issue Context
Compliance requires: (1) all failures surfaced via the CLI raise `CliError`, and (2) environment/setup problems (permissions, unreadable files, disk/rename failures) use exit code 2.

## Fix Focus Areas
- data_refinery/store/backends/files.py[124-138]
- data_refinery/store/backends/files.py[239-257]
- data_refinery/cli/__init__.py[104-124]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Non-dict JSON breaks migrate ✓ Resolved 🐞 Bug ☼ Reliability
Description
FilesBackend.migrate() passes json.loads() results directly into Envelope.from_dict()
(self-canonicalize) or the consumer transform without verifying the decoded value is a dict and
without wrapping Envelope.from_dict shape errors. A JSONL line like [], "x", or {} can
therefore raise AttributeError/KeyError and bypass the intended code-2 “corrupt line” CliError
handling (CLI ends up emitting a generic wrapped “unexpected” error instead).
Code

data_refinery/store/backends/files.py[R295-304]

+    if transform is None:
+        return Envelope.from_dict(obj)  # type: ignore[arg-type]
+    if isinstance(obj, dict):
+        try:
+            already = Envelope.from_dict(obj)
+        except (KeyError, TypeError, AttributeError, ValueError, CliError):
+            already = None
+        if already is not None and already.to_dict() == obj:
+            return already
+    return transform(obj)  # type: ignore[arg-type]
Evidence
_to_envelope() directly calls Envelope.from_dict(obj) when transform is None and calls
transform(obj) even when obj may not be a dict. Envelope.from_dict() requires a dict and will
throw if given a non-dict or a dict missing id. Those exceptions are not caught in
_migrate_lines(), so in CLI mode they get wrapped into a generic code-1 “unexpected …” error by
the top-level dispatcher instead of a targeted code-2 “corrupt line …” error.

data_refinery/store/backends/files.py[148-166]
data_refinery/store/backends/files.py[284-305]
data_refinery/store/envelope.py[72-99]
data_refinery/cli/init.py[104-124]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`FilesBackend.migrate()` currently assumes each decoded JSONL line is a dict-shaped envelope/record. If a line is valid JSON but not an object (e.g. `[]`, `"s"`, `1`) or is a dict missing required keys (e.g. `{}`), `Envelope.from_dict()` will raise `AttributeError`/`KeyError`. Those exceptions are not converted into the structured `CliError(code=EXIT_ENV_ERROR, ...)` that the migration contract expects for corrupt/invalid source lines.

## Issue Context
- `_migrate_lines()` only wraps `json.JSONDecodeError`, but not failures from `_to_envelope()` / `Envelope.from_dict()`.
- `Envelope.from_dict()` assumes a `dict` and uses `data.get(...)` and `data["id"]`, which will throw if the input is not a dict or lacks keys.
- The CLI’s top-level exception wrapper will turn these into a generic `CliError(code=1, message="unexpected: ...")`, which loses the intended “corrupt line in <file>” message and exit-code semantics.

## Fix Focus Areas
- data_refinery/store/backends/files.py[148-166]
- data_refinery/store/backends/files.py[284-305]

### Recommended fix
1. In `_to_envelope()` (or in `_migrate_lines()` before calling it), enforce that `obj` is a `dict` when:
  - `transform is None` (self-canonicalize), and
  - `transform is not None` (since the public contract types `Transform` as `Callable[[dict], ...]`).
  If not a dict, raise `CliError(code=EXIT_ENV_ERROR, message=f"corrupt line in {path.name}: expected a JSON object", remediation=...)`.
2. Wrap `Envelope.from_dict(...)` failures (at least `KeyError`, `TypeError`, `AttributeError`, `ValueError`) into `CliError(code=EXIT_ENV_ERROR, message=f"corrupt line in {path.name}: ...", remediation=f"remove or repair the corrupt line in {path}")`.
3. After enforcing dict-ness, remove the `# type: ignore[arg-type]` calls where possible, since the types will now be correct.

This preserves the “no traceback”, structured error, and exit-code-2 behavior for malformed store lines during migration.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Add store migration endpoint with atomic, idempotent JSONL rewrites (files backend)
✨ Enhancement 🧪 Tests 📝 Documentation ⚙️ Configuration changes 🕐 40+ Minutes

Grey Divider

Description

• Add importable store.migrate() to upgrade legacy stores without consumer path construction.
• Implement atomic, whole-store validated JSONL rewrites with idempotent re-runs.
• Expose data-refinery store migrate CLI for self-canonicalizing DR’s own format.
Diagram

graph TD
  consumer{{"Consumer (library)"}} --> migrate_api["store.migrate()"] --> files_backend["FilesBackend.migrate()"] --> jsonl_store[("JSONL store")]
  cli["CLI: store migrate"] --> migrate_api
  files_backend --> cli_error["CliError (no traceback)"]

  subgraph Legend
    direction LR
    _ext{{"External"}} ~~~ _mod["Module/API"] ~~~ _db[("Storage")]
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Make CLI accept a declarative transform spec
  • ➕ Enables non-Python consumers to perform true legacy-to-Envelope migrations via subprocess
  • ➕ Avoids import-time coupling to a Python callable in some deployments
  • ➖ Hard to standardize safely (schema/versioning, security review of arbitrary transforms)
  • ➖ Likely expands scope into an ETL framework (explicitly out of bounds here)
2. Journaled multi-file transaction (manifest + commit phase)
  • ➕ Could provide stronger all-or-nothing semantics across the entire store
  • ➕ Easier recovery/inspection for partially completed migrations
  • ➖ More complexity and more on-disk state to manage
  • ➖ Current per-file atomic + whole-store preflight already covers the primary safety goals

Recommendation: Keep the PR’s approach: library-based transform migration (for true legacy conversion) plus a CLI self-canonicalization verb. This cleanly preserves the security boundary (consumer never constructs write paths), keeps the CLI contract pinnable/simple, and achieves robust safety via whole-store preflight validation and per-file atomic writes.

Files changed (15) +981 / -17

Enhancement (4) +296 / -5
store.pyAdd 'data-refinery store migrate' CLI verb +40/-0

Add 'data-refinery store migrate' CLI verb

• Introduces a new 'store migrate' subcommand supporting '--backend', '--dry-run', and '--json'. The CLI calls the library migration entrypoint in self-canonicalize mode (no Python transform over argv) and formats results or structured errors.

data_refinery/cli/_commands/store.py

__init__.pyExport 'migrate' from data_refinery.store public API +2/-0

Export 'migrate' from data_refinery.store public API

• Re-exports the new migration entrypoint so consumers can call 'data_refinery.store.migrate(...)' directly.

data_refinery/store/init.py

files.pyImplement files backend migration + shared atomic-write helper +191/-5

Implement files backend migration + shared atomic-write helper

• Adds 'FilesBackend.migrate()' implementing whole-store preflight planning/validation, symlink escape refusal, and atomic per-file rewrites via temp sibling + 'os.replace'. Refactors '_save' to use the shared '_atomic_write', hardening normal upsert/delete writes against truncate-on-crash, and adds canonical serialization/validation helpers to preserve idempotency.

data_refinery/store/backends/files.py

migrate.pyAdd public migration entrypoint selecting backend +63/-0

Add public migration entrypoint selecting backend

• Introduces 'data_refinery.store.migrate(transform, *, backend, base_dir, dry_run)' with files backend support and structured errors for unsupported backends. Documents idempotency semantics (verbatim preservation of already-canonical lines) and the consumer boundary (no write-path construction).

data_refinery/store/migrate.py

Tests (1) +334 / -0
test_store_migrate.pyAdd comprehensive tests for migration safety, idempotency, and CLI behavior +334/-0

Add comprehensive tests for migration safety, idempotency, and CLI behavior

• Introduces end-to-end tests covering self-canonicalization, consumer transforms (including non-idempotent transforms applied exactly once), whole-store abort safety, symlink escape refusal, atomic write failure handling, dry-run behavior, unsupported backend errors, and CLI output modes.

tests/test_store_migrate.py

Documentation (9) +350 / -11
currentPoint devague tracking to new migration spec frame +1/-1

Point devague tracking to new migration spec frame

• Updates the current devague pointer to the new exported frame describing the store-migration endpoint.

.devague/current

data-refinery-now-owns-store-file-migration-a-cons.jsonAdd devague frame capturing migration endpoint guarantees and rationale +181/-0

Add devague frame capturing migration endpoint guarantees and rationale

• Introduces a structured frame documenting the endpoint’s purpose, boundary decisions, and load-bearing guarantees (atomicity, idempotency, path safety, whole-store validation).

.devague/frames/data-refinery-now-owns-store-file-migration-a-cons.json

AGENTS.colleague.mdDocument Wave 3 slice: store migration endpoint and invariants +8/-1

Document Wave 3 slice: store migration endpoint and invariants

• Updates the agent guidance to reflect issue #8 completion and summarizes atomic/idempotent migration behavior and files-first backend support.

AGENTS.colleague.md

CHANGELOG.mdAdd 0.6.0 entry describing store migration endpoint and atomic writes +13/-0

Add 0.6.0 entry describing store migration endpoint and atomic writes

• Adds release notes for v0.6.0 covering the new migration endpoint/CLI, whole-store preflight validation, and shared atomic write hardening.

CHANGELOG.md

CLAUDE.mdUpdate project overview to include migration endpoint (Wave 3 slice) +27/-7

Update project overview to include migration endpoint (Wave 3 slice)

• Refreshes the internal project narrative and roadmap to include 'store.migrate'/'store migrate', backend granularity limitations, and key safety guarantees.

CLAUDE.md

README.mdExpose migration usage in README (CLI + importable API) +6/-0

Expose migration usage in README (CLI + importable API)

• Adds 'store migrate' CLI example and shows importing 'store.migrate(transform, base_dir=...)' for consumer-driven legacy upgrades.

README.md

catalog.pyDocument store migrate verb and consumer migration guidance +20/-0

Document store migrate verb and consumer migration guidance

• Extends the explain catalog to describe 'store migrate' semantics, idempotency/atomicity, and how consumers should use the importable migration API for legacy formats.

data_refinery/explain/catalog.py

contract.mdBump contract to v3 and specify migration API/CLI contract +47/-2

Bump contract to v3 and specify migration API/CLI contract

• Updates the pinned contract version and documents the 'store migrate' JSON shape, invariants (idempotent/atomic/validated), and backend limitations (files-only today).

docs/contract.md

2026-06-21-data-refinery-now-owns-store-file-migration-a-cons.mdAdd design spec for store migration endpoint (issue #8) +47/-0

Add design spec for store migration endpoint (issue #8)

• Adds a narrative spec describing before/after state, requirements, boundary decisions, and success criteria for moving store rewrite responsibility into data-refinery.

docs/specs/2026-06-21-data-refinery-now-owns-store-file-migration-a-cons.md

Other (1) +1 / -1
pyproject.tomlBump package version to 0.6.0 +1/-1

Bump package version to 0.6.0

• Updates the project version to reflect the new stable contract surface.

pyproject.toml

Comment thread data_refinery/store/backends/files.py Outdated
Comment thread data_refinery/store/backends/files.py Outdated
…odo PR #9)

Triages the two findings from the Qodo agentic review of PR #9; both were real
gaps against the repo's own error contract (CLAUDE.md: code 2 for
environment/setup faults, structured CliError for every failure, no generic
code-1 "unexpected" wrap for known failure modes) — FIX, not pushback.

1. Raw OSError -> code-2 CliError (Qodo "_atomic_write re-raises OSError",
   Rule 1077531). A scope file that can't be read (permissions) or written
   (full disk, denied permission, failed os.replace) was re-raised as a bare
   OSError and caught by the dispatcher's last-resort wrapper as
   EXIT_USER_ERROR (1) "unexpected: OSError ..." with a "file a bug"
   remediation. Now the shared _atomic_write converts the fault (after temp
   cleanup) to CliError(code=2) with a space/permissions remediation, and
   migrate() wraps the read the same way — so upsert/delete/migrate all obey
   the contract.

2. Non-object / missing-key line -> code-2 "corrupt line" (Qodo "Non-dict
   JSON breaks migrate"). A valid-JSON-but-non-object line ([], "x", 1) or a
   record missing its id raised AttributeError/KeyError that _migrate_lines
   didn't catch (only JSONDecodeError), again becoming a generic code-1 wrap.
   Now both _migrate_lines and the day-to-day _load enforce dict-ness and map
   shape errors (KeyError/TypeError/AttributeError/ValueError) to a structured
   code-2 "corrupt line" via a shared _corrupt_line helper; a structured
   CliError (e.g. unknown visibility) is passed through with its own code.
   Dropped the now-unnecessary # type: ignore[arg-type] on _to_envelope.

Same fix applied to the read path (_load), which shared the latent bug.

Tests (+6, 31 in the migrate suite; 166 pass overall): non-object line,
missing-id line, unreadable scope file (read OSError), write OSError surfaces
code 2, self-canonicalise bad-visibility passthrough, and read-path parity for
_load (malformed lines, bad visibility, blank-line skip).

Gate: rubric 26/26, black/isort/flake8/bandit/markdownlint clean,
files.py 100% / migrate.py 100%. Version stays 0.6.0 (unreleased); CHANGELOG
amended.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AyLu1fvaV2Ys1ZQmUUeGAc
@OriNachum

Copy link
Copy Markdown
Contributor Author

Triage of the Qodo review — both findings: FIX (pushed in a82770f).

Both were real gaps against this repo's own error contract (CLAUDE.md: exit code 2 for environment/setup faults, structured CliError for every failure, no generic code-1 "unexpected" wrap for a known failure mode), so no pushback — your Rule 1077531 is exactly the contract we hold.

1. _atomic_write re-raises OSError (Rule violation). Correct. A scope file that can't be read (permissions) or written (full disk, denied permission, failed os.replace) was re-raised as a bare OSError and caught by the dispatcher's last-resort wrapper as EXIT_USER_ERROR (1) "unexpected: OSError …" with a "file a bug" remediation — wrong code and wrong remediation for an environment fault.

  • The shared _atomic_write now converts the fault (after temp cleanup) into CliError(code=2) with a free-space/permissions remediation. Because it's the one write chokepoint, upsert/delete/migrate all obey the contract now, not just migration.
  • migrate() wraps the read_text the same way (could not read …, code 2).

2. Non-dict JSON breaks migrate (Bug). Correct. A valid-JSON-but-non-object line ([], "x", 1) or a record missing id raised AttributeError/KeyError that _migrate_lines didn't catch (only JSONDecodeError), so it became the same generic code-1 wrap instead of a code-2 "corrupt line".

  • _migrate_lines now enforces dict-ness (expected a JSON object, got <type>) and maps shape errors (KeyError/TypeError/AttributeError/ValueError) to a structured code-2 "corrupt line" via a shared _corrupt_line helper; a structured CliError (e.g. an unknown scope.visibility) is passed through with its own code.
  • Applied the same fix to the day-to-day _load path (get/list/all), which shared the latent bug, and dropped the now-unnecessary # type: ignore[arg-type] on _to_envelope per your cleanup note.

Verification. +6 tests (non-object line, missing-id line, unreadable scope file, write-OSError → code 2, bad-visibility passthrough, and read-path parity for _load). Gate: rubric 26/26, files.py / migrate.py coverage 100%, black/isort/flake8/bandit/markdownlint clean. Both cases confirmed live now exit 2 with a structured remediation.

Thanks for the pass — both were worth fixing.

  • data-refinery-cli (Claude)

@sonarqubecloud

Copy link
Copy Markdown

@OriNachum OriNachum merged commit 380ae7a into main Jun 21, 2026
8 checks passed
@OriNachum OriNachum deleted the feat/store-migration-endpoint-issue-8 branch June 21, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose a files/migration endpoint so consumers don't construct write paths (unblocks eidetic-cli S2083 gate on #14)

1 participant