Skip to content

feat: opt-in fail-closed .gitignore on files-store materialization (closes #12)#13

Open
OriNachum wants to merge 11 commits into
mainfrom
feat/issue-12-gitignore
Open

feat: opt-in fail-closed .gitignore on files-store materialization (closes #12)#13
OriNachum wants to merge 11 commits into
mainfrom
feat/issue-12-gitignore

Conversation

@OriNachum

Copy link
Copy Markdown
Contributor

What & why

Closes #12. The files store backend can now optionally write a fail-closed .gitignore into the store base_dir on materialization, so a consumer (eidetic-cli) keeps private shards out of git without ever constructing a filesystem write path itself. data-refinery owns the <scope>__<visibility>.jsonl on-disk layout, so it owns the ignore pattern that tracks it — this keeps the .gitignore write-path sink (eidetic's prior pythonsecurity:S2083) on the storage owner. Continues #8 / #1.

The .gitignore content is a whitelist that ignores everything but public shards:

*
!.gitignore
!*__public.jsonl

Any future private filename or sidecar is excluded by default rather than silently leaked.

Surface (the agreed "Option B")

write_gitignore: bool = False (default off) is reachable without the consumer building any path:

  • FilesBackend(base_dir, write_gitignore=True)
  • data_refinery.store.put/get/list(..., backend="files", base_dir=..., write_gitignore=True) — this also fixes a latent bug: files.build() previously dropped all kwargs, so base_dir/write_gitignore never reached the backend via get_backend.
  • data_refinery.store.migrate(transform, *, backend="files", base_dir=..., write_gitignore=..., dry_run=...)

Behavior / invariants

  • Opt-in — default off is byte-identical to today.
  • Materialize on write, never on read — written on upsert and a non-dry store migrate apply; never on get/list or a dry-run.
  • Create-when-absent only — an existing .gitignore is never overwritten (atomic temp + os.replace; structured CliError on fault, never a traceback).
  • Files backend onlymongo/neo4j are a no-op.
  • Scope no-leak (can_serve) and idempotent dedup invariants untouched.

Verification

  • 176 tests pass (+10 new: 7 in tests/test_store_gitignore.py, 3 in tests/test_store_migrate.py), incl. a real-git check-ignore integration test (skips gracefully without git).
  • black / isort / flake8 / bandit / markdownlint clean; teken agent-first rubric passes.
  • Live end-to-end in a real git repo: store.put(..., write_gitignore=True)git check-ignore reports the private shard ignored, the public shard tracked, and a future sidecar ignored; git add -A stages only .gitignore + the public shard.

Consumer note (eidetic-cli)

A cross-check of eidetic's store call sites confirms it reaches this surface with no new write-path construction — its StoreBackend already forwards any get_backend(...) kwarg through to store.put. The only follow-up is on eidetic's side (its migrate_store() forwarding, tracked in eidetic#25). eidetic raises its data-refinery-cli floor to this release (0.9.0) and flips its private-in-repo cutover atomically with that bump.

How it was built

Spec → plan → implementation via /think/spec-to-plan/assign-to-workforce, with the colleague backend (Qwen3.6-27B via ask-colleague) as the explorer/worker/reviewer and the main agent TDD-gating each merge. The converged spec + plan are committed under docs/specs/ + docs/plans/.

  • data-refinery-cli (Claude)

🤖 Generated with Claude Code

OriNachum and others added 10 commits June 24, 2026 17:14
#12)

Converged devague frame + build plan for DR's files backend optionally
writing a fail-closed .gitignore on store-dir materialization, so a consumer
(eidetic-cli) keeps private shards out of git without constructing a write path.

Surface decision (Option B): write_gitignore on FilesBackend init +
store.migrate, and fix files.build to honor base_dir+write_gitignore so
store.put/get/list flow them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
…d o...

Implement issue #12 in data-refinery-cli: the files backend optionally writes a fail-closed .gitignore on store-dir materialization. Work TEST-FIRST.

Read first: docs/specs/2026-06-24-data-refinery-s-files-backend-can-write-a-fail-clo.md and CLAUDE.md. The target module is data_refinery/store/backends/files.py.

Implement EXACTLY this (the agreed "Option B"):

1) data_refinery/store/backends/files.py
   - Add a keyword param `write_gitignore: bool = False` to FilesBackend.__init__ (after base_dir). Store it as self._write_gitignore. Do NOT change the existing eager `self._base.mkdir(parents=True, exist_ok=True)`.
   - Add a private method `_ensure_gitignore(self) -> None` that, ONLY when self._write_gitignore is True, creates `<base_dir>/.gitignore` ONLY IF it does not already exist (Path.exists() check), writing exactly these bytes (note trailing newline):
         *\n!.gitignore\n!*__public.jsonl\n
     Never overwrite an existing .gitignore (it may carry user edits). If writing it raises OSError, surface a structured CliError(code=EXIT_ENV_ERROR, ...) consistent with _atomic_write — never a traceback.
   - Call _ensure_gitignore() ONLY on write/materialize paths: at the very start of upsert(), and inside migrate() within the `if not dry_run:` block BEFORE the pass-2 apply loop (so a dry-run NEVER creates it, and an apply materializes it even if the plan is empty). Do NOT call it in __init__, get(), list(), or all(). Reads must never create the file.
   - Fix the module-level `build()` factory (currently `def build(**_kwargs): return FilesBackend()`, which DROPS kwargs) to honor base_dir and write_gitignore:
         def build(*, base_dir=None, write_gitignore=False, **_kwargs):
             return FilesBackend(base_dir, write_gitignore=write_gitignore)
     Keep accepting/ignoring other kwargs (e.g. timeout_ms) via **_kwargs.

2) Tests — NEW file tests/test_store_gitignore.py (write the tests FIRST, then implement). Cover:
   - .gitignore content is exactly "*\n!.gitignore\n!*__public.jsonl\n" after an upsert with write_gitignore=True.
   - Default (write_gitignore=False): after upsert NO .gitignore exists; the dir holds only the scope .jsonl (byte-identical to today).
   - A read get()/list() with write_gitignore=True does NOT create .gitignore.
   - An existing .gitignore with DIFFERENT content is never overwritten after an upsert.
   - In a real temp git repo (git init + set user.email/user.name), with write_gitignore=True after putting one private-scope and one public-scope envelope: `git check-ignore -q <scope>__private.jsonl` is ignored (exit 0), an arbitrary non-public sidecar name like `foo__index.bin` is ignored, and `git check-ignore -q <scope>__public.jsonl` is NOT ignored (exit 1). Skip this test gracefully when `git` is absent (shutil.which("git") is None -> pytest.skip).
   - build(base_dir=tmp, write_gitignore=True) returns a FilesBackend honoring both; and data_refinery.store.put(env, backend="files", base_dir=tmp, write_gitignore=True) forwards the kwargs (get_backend->build now honors them) and materializes the .gitignore.

Constraints:
   - stdlib only; do NOT add runtime dependencies (the `dependencies = []` invariant).
   - black + isort + flake8 clean at line-length 100; bandit clean.
   - No traceback ever; raise CliError on faults; match the existing file's style/helpers (_atomic_write, _serialize, _VISIBILITIES, etc.).
   - Before finishing, run: uv run pytest tests/test_store_gitignore.py -q  AND  uv run black --check data_refinery tests && uv run isort --check-only data_refinery tests && uv run flake8 data_refinery tests
     and make them all pass.

Deliver the change committed on your drive branch. Do not edit pyproject.toml, CHANGELOG.md, README.md, docs/, or data_refinery/store/migrate.py — those are other tasks.

Implement the task above in this repository.

Rules:
- Make the SMALLEST change that correctly satisfies the task.
- Follow the repository's existing patterns, style, and conventions — read the
  neighbouring files first so your change reads like the surrounding code.
- Keep edits lint-clean: respect the project's maximum line length and end every
  text file with exactly one trailing newline.
- You may read, create, modify files, and run commands as needed.
- Don't widen the scope: do exactly what was asked, nothing more.

When you are done, call finish with a short summary of exactly what you changed
and why.
…ialization

Colleague-implemented (Qwen3.6-27B via ask-colleague write), TDD-gated by the
main agent. FilesBackend gains write_gitignore (default off); _ensure_gitignore
creates base_dir/.gitignore (* / !.gitignore / !*__public.jsonl) create-when-
absent on write paths only (upsert + migrate-apply, never reads/dry-run).
build() fixed to honor base_dir+write_gitignore (was dropping kwargs) so
store.put/get/list flow them. New tests/test_store_gitignore.py (7 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
…nor...

Issue #12 in data-refinery-cli, task t2: plumb write_gitignore through the importable store.migrate endpoint. Work TEST-FIRST. The files backend already supports write_gitignore (FilesBackend.__init__ accepts it and FilesBackend.migrate() calls _ensure_gitignore on apply) — t2 only wires the top-level migrate() function to forward it.

Target file: data_refinery/store/migrate.py

1) data_refinery/store/migrate.py
   - The current top-level function is:
         def migrate(transform=None, *, backend="files", base_dir=None, dry_run=False) -> dict[str, Any]:
             if backend == "files":
                 return FilesBackend(base_dir).migrate(transform, dry_run=dry_run)
             raise CliError(...)
   - Add a keyword param `write_gitignore: bool = False` (place it after base_dir, before dry_run). Forward it into the FilesBackend constructor:
         return FilesBackend(base_dir, write_gitignore=write_gitignore).migrate(transform, dry_run=dry_run)
   - Update the docstring: with write_gitignore=True the files backend materializes the fail-closed .gitignore (* / !.gitignore / !*__public.jsonl) during the apply pass; a dry_run never writes it; default False is byte-identical to today; files backend only.
   - Do NOT add a CLI flag — the CLI `store migrate` verb stays unchanged (Option B is import-surface only).

2) Tests — ADD to the EXISTING file tests/test_store_migrate.py (do NOT create a new test file, and do NOT touch tests/test_store_gitignore.py). Add tests covering:
   - store.migrate(base_dir=tmp, write_gitignore=True) creates tmp/.gitignore after a real (non-dry) migrate. Setup: first store.put(Envelope(id="a", content="x"), backend="files", base_dir=tmp) WITHOUT write_gitignore so a scope file exists and no .gitignore yet; then store.migrate(base_dir=tmp, write_gitignore=True); assert (tmp/".gitignore").exists() and its content == "*\n!.gitignore\n!*__public.jsonl\n".
   - store.migrate(base_dir=tmp, write_gitignore=True, dry_run=True) does NOT create tmp/.gitignore.
   - default store.migrate(base_dir=tmp) (write_gitignore omitted) creates no .gitignore.
   - import data_refinery.store as store; use store.migrate / store.put / store.Envelope (or import Envelope from data_refinery.store.envelope, matching the existing test file's import style — read the file first and match it).

Constraints:
   - stdlib only; no new runtime dependencies.
   - black + isort + flake8 clean at line-length 100; bandit clean; no traceback (CliError only).
   - Match the existing migrate.py + test_store_migrate.py style.
   - Before finishing run: uv run pytest tests/test_store_migrate.py -q  AND  uv run black --check data_refinery tests && uv run isort --check-only data_refinery tests && uv run flake8 data_refinery tests  — all must pass.

Deliver committed on your drive branch. Do NOT edit files.py, pyproject.toml, CHANGELOG.md, README.md, docs/, or uv.lock.

Implement the task above in this repository.

Rules:
- Make the SMALLEST change that correctly satisfies the task.
- Follow the repository's existing patterns, style, and conventions — read the
  neighbouring files first so your change reads like the surrounding code.
- Keep edits lint-clean: respect the project's maximum line length and end every
  text file with exactly one trailing newline.
- You may read, create, modify files, and run commands as needed.
- Don't widen the scope: do exactly what was asked, nothing more.

When you are done, call finish with a short summary of exactly what you changed
and why.
Colleague-implemented (Qwen3.6-27B via ask-colleague write), TDD-gated by the
main agent. store.migrate() gains write_gitignore (default off), forwarded to
FilesBackend so a non-dry files migrate materializes the fail-closed .gitignore;
dry_run never writes it. No CLI flag (Option B is import-surface only).
3 new tests in tests/test_store_migrate.py asserting real on-disk behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
Minor bump per the contract versioning policy (new optional param). CHANGELOG
documents the fail-closed .gitignore opt-in and the build() kwarg-honoring fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
…ent...

Issue #12 in data-refinery-cli, task t3 (DOCS ONLY): document the new opt-in `write_gitignore` files-store surface that was just implemented. Do NOT touch any .py file, pyproject.toml, CHANGELOG.md, or uv.lock — the version bump + CHANGELOG are handled separately. Read the just-merged code first so the docs are accurate:
  - data_refinery/store/backends/files.py (FilesBackend.__init__ now takes write_gitignore; _ensure_gitignore; build() honors base_dir+write_gitignore)
  - data_refinery/store/migrate.py (migrate() now takes write_gitignore)
  - docs/contract.md, README.md, AGENTS.colleague.md (the files you will edit)

What the feature is (document it exactly): when a consumer opts in with `write_gitignore=True`, the files backend ensures a fail-closed `.gitignore` exists in the store `base_dir` with EXACTLY this content:
    *
    !.gitignore
    !*__public.jsonl
It ignores everything and only ever allows public shards (and the .gitignore itself) to be tracked, so any future private filename or sidecar is excluded by default. Behavior: opt-in (default False; off is byte-identical to today); written only on a write/materialize (upsert + a non-dry store.migrate apply), NEVER on a read (get/list) or a dry-run; create-when-absent only (an existing .gitignore is never overwritten); files backend only (mongo/neo4j are a no-op). It is reachable WITHOUT the consumer constructing any filesystem write path, via: FilesBackend(base_dir, write_gitignore=True); data_refinery.store.put/get/list(..., backend="files", base_dir=..., write_gitignore=True) (get_backend forwards kwargs to the files build()); and data_refinery.store.migrate(transform, *, backend="files", base_dir=..., write_gitignore=..., dry_run=...). Rationale: data-refinery OWNS the `<scope>__<visibility>.jsonl` on-disk layout, so it owns the ignore pattern that tracks it; this keeps the .gitignore write-path sink (the consumer's prior pythonsecurity:S2083) on the storage owner. Continues issues #8 / #1.

Edits to make:

1) docs/contract.md
   - In the Wave 3 section (after "the store-migration endpoint" subsection, before "## Versioning policy"), add a new subsection titled exactly:
        ### Fail-closed `.gitignore` opt-in (stable)
     documenting all of the above: the opt-in param, the exact whitelist content (in a ```gitignore code block), the materialize-on-write-not-read rule, create-when-absent/never-clobber, files-only no-op, default-off-byte-identical, the three reachable surfaces (FilesBackend init / store.put|get|list / store.migrate), and the boundary rationale. Keep the prose terse and in the same voice as the rest of the doc.
   - Update the migrate signature line currently reading:
        `migrate(transform=None, *, backend="files", base_dir=None, dry_run=False)`
     to include the new parameter:
        `migrate(transform=None, *, backend="files", base_dir=None, write_gitignore=False, dry_run=False)`

2) README.md — find the store / store-surface section and add a short note (1-3 sentences) that the files backend can optionally write a fail-closed `.gitignore` via `write_gitignore=True` (default off), so a consumer keeps private shards out of git without constructing a write path. Match the README's existing tone/format. If you cannot find an obviously-right store section, add it near where the store put/get/list or `[store]` extra is described.

3) AGENTS.colleague.md — find where the store surface is described and add a one-line mention of the `write_gitignore` opt-in (files-only, default off, fail-closed). If there is no store section, add a concise line under the most relevant heading.

Constraints:
  - Markdown must pass: markdownlint-cli2 "docs/contract.md" "README.md" "AGENTS.colleague.md" (fix any violations you introduce; match the surrounding heading/list/code-fence style).
  - Do NOT edit code, tests, pyproject.toml, CHANGELOG.md, or uv.lock.
  - Be accurate to the code you read — do not invent flags or behavior.

Deliver committed on your drive branch.

Implement the task above in this repository.

Rules:
- Make the SMALLEST change that correctly satisfies the task.
- Follow the repository's existing patterns, style, and conventions — read the
  neighbouring files first so your change reads like the surrounding code.
- Keep edits lint-clean: respect the project's maximum line length and end every
  text file with exactly one trailing newline.
- You may read, create, modify files, and run commands as needed.
- Don't widen the scope: do exactly what was asked, nothing more.

When you are done, call finish with a short summary of exactly what you changed
and why.
…colleague.md)

Colleague-drafted (Qwen3.6-27B via ask-colleague write), reviewed by the main
agent. New "Fail-closed .gitignore opt-in" subsection in docs/contract.md (Wave
3) with the exact whitelist, behavior rules, reachable surfaces, and boundary
rationale; migrate() signature updated; README example; AGENTS.colleague.md note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
The devague-exported spec/plan used bare <scope>__private.jsonl tokens that
markdownlint reads as inline HTML; wrap them in code spans so the CI markdown
lint job stays green (docs/specs is not in the ignore list).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
…c_write

Addresses the colleague review's one concrete finding: a crash mid-.gitignore
write could leave a .gitignore.tmp that _reap_orphan_tmp (globbing only
*.jsonl.tmp) never cleaned. _ensure_gitignore now reuses the shared
_atomic_write (consistent structured CliError, no duplicated temp+replace), and
_reap_orphan_tmp also reaps the .gitignore temp. New test locks in the reaping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
@OriNachum

Copy link
Copy Markdown
Contributor Author

/agentic_review

@qodo-code-review

qodo-code-review Bot commented Jun 24, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (1) 📎 Requirement gaps (0) 📜 Skill insights (0)

Context used
✅ Compliance rules (platform): 25 rules

Grey Divider


Remediation recommended

1. Gitignore creation race 🐞 Bug ☼ Reliability
Description
FilesBackend._ensure_gitignore() uses a non-atomic exists() check and then writes via
_atomic_write() (fixed .gitignore.tmp + os.replace), so concurrent upserts can collide on the
temp file or overwrite a .gitignore created after the check—violating the “never overwrite
existing .gitignore” guarantee under race conditions.
Code

data_refinery/store/backends/files.py[R57-62]

+        if not self._write_gitignore:
+            return
+        gi = self._base / _GITIGNORE_NAME
+        if gi.exists():
+            return
+        self._atomic_write(gi, _GITIGNORE_BODY)
Relevance

⭐⭐ Medium

No prior reviews on create-only race/atomic temp collisions; team accepted os.replace temp-write
pattern in PR9.

PR-#9

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
The .gitignore creation path is implemented as an exists() pre-check followed by
_atomic_write(), which uses a deterministic path.name + '.tmp' temp file and `os.replace(tmp,
path)` overwrite semantics; these together make the operation non-atomic for “create-only” and
unsafe under concurrent writers.

data_refinery/store/backends/files.py[48-63]
data_refinery/store/backends/files.py[298-326]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`FilesBackend._ensure_gitignore()` is implemented as `if gi.exists(): return` followed by `_atomic_write(gi, ...)`. This is a TOCTOU pattern and `_atomic_write()` uses a deterministic sibling temp name plus `os.replace()`, which overwrites an existing destination. Under concurrent writers, this can:
- collide on `.gitignore.tmp` (one writer moves it out from under the other), and/or
- overwrite a `.gitignore` created between the `exists()` check and the `replace()`.

## Issue Context
The feature is described as “create-when-absent only” (never clobber). That invariant currently holds only in single-writer scenarios.

## Fix Focus Areas
- data_refinery/store/backends/files.py[48-63]
- data_refinery/store/backends/files.py[298-326]

## Suggested fix
Implement an atomic create-if-absent strategy specifically for `.gitignore`:
- Option A (simplest): `try: gi.open('x')` and write `_GITIGNORE_BODY`; `except FileExistsError: return`. (This avoids overwrites and temp collisions; accept that the file write itself isn’t rename-atomic.)
- Option B (atomic + no-overwrite): write to a uniquely-named temp file (not `.gitignore.tmp`), then attempt an atomic “create only if absent” step (e.g., `os.link(tmp, gi)`; if it fails with EEXIST, drop the temp; if it succeeds, unlink temp). Fall back to Option A if hardlinks aren’t available.

Also consider tracking an in-memory `_gitignore_ensured` flag to avoid repeated stats once ensured.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Informational

2. store.migrate() adds write_gitignore 📘 Rule violation ⌂ Architecture
Description
The public data_refinery.store.migrate() signature now includes write_gitignore, which exceeds
the allowed parameter set. This violates the rule restricting store.migrate to transform plus
backend/base_dir/dry_run only.
Code

data_refinery/store/migrate.py[R28-33]

    *,
    backend: str = DEFAULT_BACKEND,
    base_dir: str | None = None,
+    write_gitignore: bool = False,
    dry_run: bool = False,
) -> dict[str, Any]:
Relevance

⭐ Low

Team is actively expanding migrate surface (this PR); only evidence is original minimal signature in
PR9.

PR-#9

ⓘ Recommendations generated based on similar findings in past PRs

Evidence
PR Compliance ID 1081899 restricts the public store.migrate signature to transform plus
backend/base_dir/dry_run only. The PR adds write_gitignore: bool = False to the
migrate(...) signature in data_refinery/store/migrate.py, violating that restriction.

Rule 1081899: Restrict store.migrate signature to transform + backend/base_dir/dry_run only
data_refinery/store/migrate.py[28-33]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`data_refinery.store.migrate()` now accepts `write_gitignore`, but the compliance rule requires `store.migrate` to only accept `transform` plus `backend`/`base_dir`/`dry_run`.

## Issue Context
The added parameter appears in the public `migrate(...)` function signature and is part of the PR’s new surface area.

## Fix Focus Areas
- data_refinery/store/migrate.py[28-33]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Opt-in fail-closed .gitignore materialization for files store backend
✨ Enhancement 🐞 Bug fix 🧪 Tests 📝 Documentation ⚙️ Configuration changes 🕐 20-40 Minutes

Grey Divider

Description

• Add opt-in write_gitignore to materialize a fail-closed .gitignore on write.
• Plumb base_dir/write_gitignore through store.put/get/list and store.migrate; fix dropped
 kwargs.
• Add docs, version bump, and tests including real git check-ignore verification.
Diagram

graph TD
  A["Consumer (eidetic-cli)"] --> B["data_refinery.store.put/migrate"] --> C["get_backend() resolver"] --> D["backends.files.build()"] --> E["FilesBackend upsert/migrate"] -->|"materialize on write"| F[("Store base_dir")]
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Use .git/info/exclude instead of writing .gitignore
  • ➕ Does not modify tracked repository files
  • ➕ Keeps ignore rules local to a clone
  • ➖ Git-specific and requires being inside a git repo
  • ➖ Harder to reason about for consumers; weaker portability (non-git contexts)
  • ➖ Still requires defining/locating an on-disk write target per repo
2. Keep .gitignore writing in the consumer (status quo)
  • ➕ No changes to the store backend
  • ➕ Consumer can tailor patterns to its repo needs
  • ➖ Reintroduces consumer-side filesystem write-path construction (the S2083 sink)
  • ➖ Risk of pattern drift from the storage layout owner
  • ➖ Harder to ensure fail-closed behavior across consumers
3. Expose a helper that returns the canonical ignore content for consumers to write
  • ➕ Data-refinery remains the source of truth for the ignore pattern
  • ➕ Minimal behavioral change in the backend
  • ➖ Still forces consumer-side file path construction and write semantics
  • ➖ Harder to enforce invariants like create-when-absent and write-only-on-materialize

Recommendation: Proceed with the PR’s approach: the storage-layout owner (data-refinery) materializes a deterministic, fail-closed .gitignore under an explicit opt-in flag, and only on write/apply paths. This best preserves the “consumer supplies only base_dir + bool” contract, avoids reintroducing consumer-side write sinks, and keeps the ignore pattern co-located with the on-disk layout it tracks.

Files changed (15) +791 / -13

Enhancement (2) +47 / -8
files.pyImplement write-time fail-closed .gitignore materialization; fix build kwargs +38/-7

Implement write-time fail-closed .gitignore materialization; fix build kwargs

• Adds 'write_gitignore' to 'FilesBackend' and materializes a canonical fail-closed '.gitignore' only during 'upsert' and non-dry 'migrate' apply, never on reads. Extends orphan temp reaping to include '.gitignore.tmp' and updates 'build()' to honor 'base_dir'/'write_gitignore' so kwargs flow correctly through 'get_backend'.

data_refinery/store/backends/files.py

migrate.pyPlumb write_gitignore through the importable migrate endpoint +9/-1

Plumb write_gitignore through the importable migrate endpoint

• Extends 'store.migrate()' signature with 'write_gitignore' and passes it into 'FilesBackend(..., write_gitignore=...)'; documents that dry-run does not materialize '.gitignore'.

data_refinery/store/migrate.py

Tests (2) +188 / -0
test_store_gitignore.pyAdd unit + integration tests for .gitignore materialization +161/-0

Add unit + integration tests for .gitignore materialization

• Adds tests for canonical content, default-off behavior, no materialization on reads, create-when-absent behavior, and temp-file reaping. Includes an integration test using 'git check-ignore' (skipped when git is unavailable).

tests/test_store_gitignore.py

test_store_migrate.pyAdd migrate() tests covering write_gitignore and dry-run behavior +27/-0

Add migrate() tests covering write_gitignore and dry-run behavior

• Adds coverage ensuring '.gitignore' is created during non-dry migrate when opted in, is not created during dry-run, and remains absent by default.

tests/test_store_migrate.py

Documentation (8) +553 / -2
data-refinery-s-files-backend-can-write-a-fail-clo.jsonAdd exported devague frame for fail-closed .gitignore feature +192/-0

Add exported devague frame for fail-closed .gitignore feature

• Introduces the structured frame capturing requirements, boundaries, and success signals for the files-backend '.gitignore' materialization behavior.

.devague/frames/data-refinery-s-files-backend-can-write-a-fail-clo.json

data-refinery-s-files-backend-can-write-a-fail-clo.jsonAdd exported devague plan for implementing write_gitignore +206/-0

Add exported devague plan for implementing write_gitignore

• Adds the task plan describing implementation steps, acceptance criteria, and risks (including git-binary availability for integration tests).

.devague/plans/data-refinery-s-files-backend-can-write-a-fail-clo.json

AGENTS.colleague.mdDocument files-backend write_gitignore opt-in +3/-1

Document files-backend write_gitignore opt-in

• Updates the agent-facing project description to mention the files backend’s opt-in fail-closed '.gitignore' behavior.

AGENTS.colleague.md

CHANGELOG.mdAdd 0.9.0 release notes for write_gitignore and build kwarg fix +10/-0

Add 0.9.0 release notes for write_gitignore and build kwarg fix

• Documents the new opt-in 'write_gitignore' behavior, its invariants (write-only, create-when-absent, files-only), and the fix to files backend factory kwarg handling.

CHANGELOG.md

README.mdAdd usage example for write_gitignore on store.put +4/-0

Add usage example for write_gitignore on store.put

• Adds a short snippet showing how consumers can opt in to '.gitignore' materialization via 'store.put(..., write_gitignore=True)'.

README.md

contract.mdDocument stable write_gitignore contract and invariants +37/-1

Document stable write_gitignore contract and invariants

• Updates the migrate signature docs and adds a dedicated section describing the fail-closed '.gitignore' contents, opt-in behavior, write-only materialization, and reachable surfaces.

docs/contract.md

2026-06-24-data-refinery-s-files-backend-can-write-a-fail-clo.mdAdd build plan document for issue #12 implementation +49/-0

Add build plan document for issue #12 implementation

• Adds a human-readable plan mirroring the exported devague plan: tasks, acceptance checks, and noted risks.

docs/plans/2026-06-24-data-refinery-s-files-backend-can-write-a-fail-clo.md

2026-06-24-data-refinery-s-files-backend-can-write-a-fail-clo.mdAdd spec document for issue #12 behavior and rationale +52/-0

Add spec document for issue #12 behavior and rationale

• Adds a spec describing before/after state, security rationale, required surfaces, and behavioral boundaries for '.gitignore' materialization.

docs/specs/2026-06-24-data-refinery-s-files-backend-can-write-a-fail-clo.md

Other (3) +3 / -3
currentPoint devague 'current' at the new frame slug +1/-1

Point devague 'current' at the new frame slug

• Updates the devague pointer to the new exported frame for issue #12 work.

.devague/current

current_planPoint devague 'current_plan' at the new plan slug +1/-1

Point devague 'current_plan' at the new plan slug

• Updates the devague pointer to the new exported implementation plan.

.devague/current_plan

pyproject.tomlBump project version to 0.9.0 +1/-1

Bump project version to 0.9.0

• Updates the package version to match the new feature release documented in the changelog.

pyproject.toml

The version-bump skill updates pyproject.toml + CHANGELOG but not uv.lock; sync
the lockfile's editable-package version so it matches the 0.9.0 bump.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01LAEeF8y7RrKft8de7rZfDM
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Files backend: optionally write a fail-closed .gitignore on store-dir materialization (keeps consumers write-path-free)

1 participant