feat: opt-in fail-closed .gitignore on files-store materialization (closes #12)#13
feat: opt-in fail-closed .gitignore on files-store materialization (closes #12)#13OriNachum wants to merge 11 commits into
Conversation
#12) Converged devague frame + build plan for DR's files backend optionally writing a fail-closed .gitignore on store-dir materialization, so a consumer (eidetic-cli) keeps private shards out of git without constructing a write path. Surface decision (Option B): write_gitignore on FilesBackend init + store.migrate, and fix files.build to honor base_dir+write_gitignore so store.put/get/list flow them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
…d o... Implement issue #12 in data-refinery-cli: the files backend optionally writes a fail-closed .gitignore on store-dir materialization. Work TEST-FIRST. Read first: docs/specs/2026-06-24-data-refinery-s-files-backend-can-write-a-fail-clo.md and CLAUDE.md. The target module is data_refinery/store/backends/files.py. Implement EXACTLY this (the agreed "Option B"): 1) data_refinery/store/backends/files.py - Add a keyword param `write_gitignore: bool = False` to FilesBackend.__init__ (after base_dir). Store it as self._write_gitignore. Do NOT change the existing eager `self._base.mkdir(parents=True, exist_ok=True)`. - Add a private method `_ensure_gitignore(self) -> None` that, ONLY when self._write_gitignore is True, creates `<base_dir>/.gitignore` ONLY IF it does not already exist (Path.exists() check), writing exactly these bytes (note trailing newline): *\n!.gitignore\n!*__public.jsonl\n Never overwrite an existing .gitignore (it may carry user edits). If writing it raises OSError, surface a structured CliError(code=EXIT_ENV_ERROR, ...) consistent with _atomic_write — never a traceback. - Call _ensure_gitignore() ONLY on write/materialize paths: at the very start of upsert(), and inside migrate() within the `if not dry_run:` block BEFORE the pass-2 apply loop (so a dry-run NEVER creates it, and an apply materializes it even if the plan is empty). Do NOT call it in __init__, get(), list(), or all(). Reads must never create the file. - Fix the module-level `build()` factory (currently `def build(**_kwargs): return FilesBackend()`, which DROPS kwargs) to honor base_dir and write_gitignore: def build(*, base_dir=None, write_gitignore=False, **_kwargs): return FilesBackend(base_dir, write_gitignore=write_gitignore) Keep accepting/ignoring other kwargs (e.g. timeout_ms) via **_kwargs. 2) Tests — NEW file tests/test_store_gitignore.py (write the tests FIRST, then implement). Cover: - .gitignore content is exactly "*\n!.gitignore\n!*__public.jsonl\n" after an upsert with write_gitignore=True. - Default (write_gitignore=False): after upsert NO .gitignore exists; the dir holds only the scope .jsonl (byte-identical to today). - A read get()/list() with write_gitignore=True does NOT create .gitignore. - An existing .gitignore with DIFFERENT content is never overwritten after an upsert. - In a real temp git repo (git init + set user.email/user.name), with write_gitignore=True after putting one private-scope and one public-scope envelope: `git check-ignore -q <scope>__private.jsonl` is ignored (exit 0), an arbitrary non-public sidecar name like `foo__index.bin` is ignored, and `git check-ignore -q <scope>__public.jsonl` is NOT ignored (exit 1). Skip this test gracefully when `git` is absent (shutil.which("git") is None -> pytest.skip). - build(base_dir=tmp, write_gitignore=True) returns a FilesBackend honoring both; and data_refinery.store.put(env, backend="files", base_dir=tmp, write_gitignore=True) forwards the kwargs (get_backend->build now honors them) and materializes the .gitignore. Constraints: - stdlib only; do NOT add runtime dependencies (the `dependencies = []` invariant). - black + isort + flake8 clean at line-length 100; bandit clean. - No traceback ever; raise CliError on faults; match the existing file's style/helpers (_atomic_write, _serialize, _VISIBILITIES, etc.). - Before finishing, run: uv run pytest tests/test_store_gitignore.py -q AND uv run black --check data_refinery tests && uv run isort --check-only data_refinery tests && uv run flake8 data_refinery tests and make them all pass. Deliver the change committed on your drive branch. Do not edit pyproject.toml, CHANGELOG.md, README.md, docs/, or data_refinery/store/migrate.py — those are other tasks. Implement the task above in this repository. Rules: - Make the SMALLEST change that correctly satisfies the task. - Follow the repository's existing patterns, style, and conventions — read the neighbouring files first so your change reads like the surrounding code. - Keep edits lint-clean: respect the project's maximum line length and end every text file with exactly one trailing newline. - You may read, create, modify files, and run commands as needed. - Don't widen the scope: do exactly what was asked, nothing more. When you are done, call finish with a short summary of exactly what you changed and why.
…ialization Colleague-implemented (Qwen3.6-27B via ask-colleague write), TDD-gated by the main agent. FilesBackend gains write_gitignore (default off); _ensure_gitignore creates base_dir/.gitignore (* / !.gitignore / !*__public.jsonl) create-when- absent on write paths only (upsert + migrate-apply, never reads/dry-run). build() fixed to honor base_dir+write_gitignore (was dropping kwargs) so store.put/get/list flow them. New tests/test_store_gitignore.py (7 tests). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
…nor... Issue #12 in data-refinery-cli, task t2: plumb write_gitignore through the importable store.migrate endpoint. Work TEST-FIRST. The files backend already supports write_gitignore (FilesBackend.__init__ accepts it and FilesBackend.migrate() calls _ensure_gitignore on apply) — t2 only wires the top-level migrate() function to forward it. Target file: data_refinery/store/migrate.py 1) data_refinery/store/migrate.py - The current top-level function is: def migrate(transform=None, *, backend="files", base_dir=None, dry_run=False) -> dict[str, Any]: if backend == "files": return FilesBackend(base_dir).migrate(transform, dry_run=dry_run) raise CliError(...) - Add a keyword param `write_gitignore: bool = False` (place it after base_dir, before dry_run). Forward it into the FilesBackend constructor: return FilesBackend(base_dir, write_gitignore=write_gitignore).migrate(transform, dry_run=dry_run) - Update the docstring: with write_gitignore=True the files backend materializes the fail-closed .gitignore (* / !.gitignore / !*__public.jsonl) during the apply pass; a dry_run never writes it; default False is byte-identical to today; files backend only. - Do NOT add a CLI flag — the CLI `store migrate` verb stays unchanged (Option B is import-surface only). 2) Tests — ADD to the EXISTING file tests/test_store_migrate.py (do NOT create a new test file, and do NOT touch tests/test_store_gitignore.py). Add tests covering: - store.migrate(base_dir=tmp, write_gitignore=True) creates tmp/.gitignore after a real (non-dry) migrate. Setup: first store.put(Envelope(id="a", content="x"), backend="files", base_dir=tmp) WITHOUT write_gitignore so a scope file exists and no .gitignore yet; then store.migrate(base_dir=tmp, write_gitignore=True); assert (tmp/".gitignore").exists() and its content == "*\n!.gitignore\n!*__public.jsonl\n". - store.migrate(base_dir=tmp, write_gitignore=True, dry_run=True) does NOT create tmp/.gitignore. - default store.migrate(base_dir=tmp) (write_gitignore omitted) creates no .gitignore. - import data_refinery.store as store; use store.migrate / store.put / store.Envelope (or import Envelope from data_refinery.store.envelope, matching the existing test file's import style — read the file first and match it). Constraints: - stdlib only; no new runtime dependencies. - black + isort + flake8 clean at line-length 100; bandit clean; no traceback (CliError only). - Match the existing migrate.py + test_store_migrate.py style. - Before finishing run: uv run pytest tests/test_store_migrate.py -q AND uv run black --check data_refinery tests && uv run isort --check-only data_refinery tests && uv run flake8 data_refinery tests — all must pass. Deliver committed on your drive branch. Do NOT edit files.py, pyproject.toml, CHANGELOG.md, README.md, docs/, or uv.lock. Implement the task above in this repository. Rules: - Make the SMALLEST change that correctly satisfies the task. - Follow the repository's existing patterns, style, and conventions — read the neighbouring files first so your change reads like the surrounding code. - Keep edits lint-clean: respect the project's maximum line length and end every text file with exactly one trailing newline. - You may read, create, modify files, and run commands as needed. - Don't widen the scope: do exactly what was asked, nothing more. When you are done, call finish with a short summary of exactly what you changed and why.
Colleague-implemented (Qwen3.6-27B via ask-colleague write), TDD-gated by the main agent. store.migrate() gains write_gitignore (default off), forwarded to FilesBackend so a non-dry files migrate materializes the fail-closed .gitignore; dry_run never writes it. No CLI flag (Option B is import-surface only). 3 new tests in tests/test_store_migrate.py asserting real on-disk behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
Minor bump per the contract versioning policy (new optional param). CHANGELOG documents the fail-closed .gitignore opt-in and the build() kwarg-honoring fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
…ent... Issue #12 in data-refinery-cli, task t3 (DOCS ONLY): document the new opt-in `write_gitignore` files-store surface that was just implemented. Do NOT touch any .py file, pyproject.toml, CHANGELOG.md, or uv.lock — the version bump + CHANGELOG are handled separately. Read the just-merged code first so the docs are accurate: - data_refinery/store/backends/files.py (FilesBackend.__init__ now takes write_gitignore; _ensure_gitignore; build() honors base_dir+write_gitignore) - data_refinery/store/migrate.py (migrate() now takes write_gitignore) - docs/contract.md, README.md, AGENTS.colleague.md (the files you will edit) What the feature is (document it exactly): when a consumer opts in with `write_gitignore=True`, the files backend ensures a fail-closed `.gitignore` exists in the store `base_dir` with EXACTLY this content: * !.gitignore !*__public.jsonl It ignores everything and only ever allows public shards (and the .gitignore itself) to be tracked, so any future private filename or sidecar is excluded by default. Behavior: opt-in (default False; off is byte-identical to today); written only on a write/materialize (upsert + a non-dry store.migrate apply), NEVER on a read (get/list) or a dry-run; create-when-absent only (an existing .gitignore is never overwritten); files backend only (mongo/neo4j are a no-op). It is reachable WITHOUT the consumer constructing any filesystem write path, via: FilesBackend(base_dir, write_gitignore=True); data_refinery.store.put/get/list(..., backend="files", base_dir=..., write_gitignore=True) (get_backend forwards kwargs to the files build()); and data_refinery.store.migrate(transform, *, backend="files", base_dir=..., write_gitignore=..., dry_run=...). Rationale: data-refinery OWNS the `<scope>__<visibility>.jsonl` on-disk layout, so it owns the ignore pattern that tracks it; this keeps the .gitignore write-path sink (the consumer's prior pythonsecurity:S2083) on the storage owner. Continues issues #8 / #1. Edits to make: 1) docs/contract.md - In the Wave 3 section (after "the store-migration endpoint" subsection, before "## Versioning policy"), add a new subsection titled exactly: ### Fail-closed `.gitignore` opt-in (stable) documenting all of the above: the opt-in param, the exact whitelist content (in a ```gitignore code block), the materialize-on-write-not-read rule, create-when-absent/never-clobber, files-only no-op, default-off-byte-identical, the three reachable surfaces (FilesBackend init / store.put|get|list / store.migrate), and the boundary rationale. Keep the prose terse and in the same voice as the rest of the doc. - Update the migrate signature line currently reading: `migrate(transform=None, *, backend="files", base_dir=None, dry_run=False)` to include the new parameter: `migrate(transform=None, *, backend="files", base_dir=None, write_gitignore=False, dry_run=False)` 2) README.md — find the store / store-surface section and add a short note (1-3 sentences) that the files backend can optionally write a fail-closed `.gitignore` via `write_gitignore=True` (default off), so a consumer keeps private shards out of git without constructing a write path. Match the README's existing tone/format. If you cannot find an obviously-right store section, add it near where the store put/get/list or `[store]` extra is described. 3) AGENTS.colleague.md — find where the store surface is described and add a one-line mention of the `write_gitignore` opt-in (files-only, default off, fail-closed). If there is no store section, add a concise line under the most relevant heading. Constraints: - Markdown must pass: markdownlint-cli2 "docs/contract.md" "README.md" "AGENTS.colleague.md" (fix any violations you introduce; match the surrounding heading/list/code-fence style). - Do NOT edit code, tests, pyproject.toml, CHANGELOG.md, or uv.lock. - Be accurate to the code you read — do not invent flags or behavior. Deliver committed on your drive branch. Implement the task above in this repository. Rules: - Make the SMALLEST change that correctly satisfies the task. - Follow the repository's existing patterns, style, and conventions — read the neighbouring files first so your change reads like the surrounding code. - Keep edits lint-clean: respect the project's maximum line length and end every text file with exactly one trailing newline. - You may read, create, modify files, and run commands as needed. - Don't widen the scope: do exactly what was asked, nothing more. When you are done, call finish with a short summary of exactly what you changed and why.
…colleague.md) Colleague-drafted (Qwen3.6-27B via ask-colleague write), reviewed by the main agent. New "Fail-closed .gitignore opt-in" subsection in docs/contract.md (Wave 3) with the exact whitelist, behavior rules, reachable surfaces, and boundary rationale; migrate() signature updated; README example; AGENTS.colleague.md note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
The devague-exported spec/plan used bare <scope>__private.jsonl tokens that markdownlint reads as inline HTML; wrap them in code spans so the CI markdown lint job stays green (docs/specs is not in the ignore list). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
…c_write Addresses the colleague review's one concrete finding: a crash mid-.gitignore write could leave a .gitignore.tmp that _reap_orphan_tmp (globbing only *.jsonl.tmp) never cleaned. _ensure_gitignore now reuses the shared _atomic_write (consistent structured CliError, no duplicated temp+replace), and _reap_orphan_tmp also reaps the .gitignore temp. New test locks in the reaping. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
|
/agentic_review |
Code Review by Qodo
Context used✅ Tickets:
🎫 Expose a files/migration endpoint so consumers don't construct write paths (unblocks eidetic-cli S2083 gate on #14) 🎫 Files backend: optionally write a fail-closed .gitignore on store-dir materialization (keeps consumers write-path-free)✅ Compliance rules (platform):
25 rules✅ Skills:
version-bump, doc-test-alignment 1. Gitignore creation race
|
PR Summary by QodoOpt-in fail-closed .gitignore materialization for files store backend Description
Diagram
High-Level Assessment
Files changed (15)
|
The version-bump skill updates pyproject.toml + CHANGELOG but not uv.lock; sync the lockfile's editable-package version so it matches the 0.9.0 bump. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
|



What & why
Closes #12. The files store backend can now optionally write a fail-closed
.gitignoreinto the storebase_diron materialization, so a consumer (eidetic-cli) keeps private shards out of git without ever constructing a filesystem write path itself. data-refinery owns the<scope>__<visibility>.jsonlon-disk layout, so it owns the ignore pattern that tracks it — this keeps the.gitignorewrite-path sink (eidetic's priorpythonsecurity:S2083) on the storage owner. Continues #8 / #1.The
.gitignorecontent is a whitelist that ignores everything but public shards:Any future private filename or sidecar is excluded by default rather than silently leaked.
Surface (the agreed "Option B")
write_gitignore: bool = False(default off) is reachable without the consumer building any path:FilesBackend(base_dir, write_gitignore=True)data_refinery.store.put/get/list(..., backend="files", base_dir=..., write_gitignore=True)— this also fixes a latent bug:files.build()previously dropped all kwargs, sobase_dir/write_gitignorenever reached the backend viaget_backend.data_refinery.store.migrate(transform, *, backend="files", base_dir=..., write_gitignore=..., dry_run=...)Behavior / invariants
upsertand a non-drystore migrateapply; never onget/listor a dry-run..gitignoreis never overwritten (atomic temp +os.replace; structuredCliErroron fault, never a traceback).mongo/neo4jare a no-op.can_serve) and idempotent dedup invariants untouched.Verification
tests/test_store_gitignore.py, 3 intests/test_store_migrate.py), incl. a real-git check-ignoreintegration test (skips gracefully without git).store.put(..., write_gitignore=True)→git check-ignorereports the private shard ignored, the public shard tracked, and a future sidecar ignored;git add -Astages only.gitignore+ the public shard.Consumer note (eidetic-cli)
A cross-check of eidetic's store call sites confirms it reaches this surface with no new write-path construction — its
StoreBackendalready forwards anyget_backend(...)kwarg through tostore.put. The only follow-up is on eidetic's side (itsmigrate_store()forwarding, tracked in eidetic#25). eidetic raises itsdata-refinery-clifloor to this release (0.9.0) and flips its private-in-repo cutover atomically with that bump.How it was built
Spec → plan → implementation via
/think→/spec-to-plan→/assign-to-workforce, with the colleague backend (Qwen3.6-27B viaask-colleague) as the explorer/worker/reviewer and the main agent TDD-gating each merge. The converged spec + plan are committed underdocs/specs/+docs/plans/.🤖 Generated with Claude Code