From 45a3ee2207a17a1ec9a3f90d9d11719fc130d814 Mon Sep 17 00:00:00 2001 From: tikankika Date: Wed, 24 Jun 2026 19:26:26 +0200 Subject: [PATCH] chore(acdm): track repo-policy rules in the repo (selective .claude gitignore) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ACDM rule-distribution principle (PR #18/#20): repo-policy rules describe the repo / are read by repo-tooling and must travel with a clone / CI / contributor (ADR-015 "protection travels with the repo"). They were gitignored under a wholesale .claude/ ignore → not in the repo. Selectively un-ignore the three so they are tracked: - data-protection.md (PII / data-protection policy) - publish-readiness.md (read by /publish-check) - internal-docs-boundary.md (what belongs in the repo) Everything else under .claude/ stays gitignored — process rules, commands, hooks, acdm.json, .mcp.json, CLAUDE.md (local / path-sensitive config, verified via git check-ignore). No config or paths are exposed. Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/rules/data-protection.md | 58 +++++++++++++ .claude/rules/internal-docs-boundary.md | 37 +++++++++ .claude/rules/publish-readiness.md | 106 ++++++++++++++++++++++++ .gitignore | 12 ++- 4 files changed, 211 insertions(+), 2 deletions(-) create mode 100644 .claude/rules/data-protection.md create mode 100644 .claude/rules/internal-docs-boundary.md create mode 100644 .claude/rules/publish-readiness.md diff --git a/.claude/rules/data-protection.md b/.claude/rules/data-protection.md new file mode 100644 index 0000000..47808c4 --- /dev/null +++ b/.claude/rules/data-protection.md @@ -0,0 +1,58 @@ +--- +paths: + - "**/*" +--- + +# Data Protection — Treat As If Public + +Whether or not this repo is private today, treat everything in it as if it were +already public. A private repo can be made public, forked, cloned, or leaked, and +anything committed is permanent. The only safe assumption is that every file and +every past commit is visible to the world. + +## Hard rule (non-negotiable) + +Real person or student data must **NEVER** exist in this repo — or any repo — in +any form, anywhere in the working tree **or its git history**. Prevention is the +only safe path: once committed it lives in the history forever (see +"Already committed?" below). + +## NEVER write in files or commit messages: +- Personal names (colleagues, research participants, teachers, students) +- School names or abbreviations that identify specific schools +- University or institution names +- Research programme names (funded projects, grants) +- Place names (streets, buildings, venues) that identify locations +- Hardcoded file paths containing usernames (`/Users/...`, `/home/...`) +- Research questions specific enough to identify a study +- Chat history or session transcripts +- Secrets: API keys, tokens, passwords, `.env` contents, credentials + +## ALWAYS use instead: +- `School A`, `School B`, `Colleague_A` for anonymised references +- `/path/to/project` for file path examples +- `SPEAKER_01`, `L1` for participant references +- Generic descriptions for research programmes +- Synthetic/fabricated data in examples + +## Check before committing: +- Think before writing — does this text contain any personal names, paths, or identifiers? +- Would a reader identify a specific person, school, or study from this text — + **directly**, OR by combining quasi-identifiers (e.g. class + date + subject can + identify a student without naming them)? +- This is your judgement. The `pii_scan` commit gate is the deterministic backstop — + it catches what you miss, but it is not a substitute for the check above. + +## Already committed? Deletion is NOT enough. + +If real data is found already in the repo, removing the file in a new commit does +**not** remove it from git history — it remains in every past commit, clone, and +fork. To actually remove it you must scrub the history (fresh-repo rebuild or a +history filter) **and** rotate any exposed secret. Stop and escalate before +publishing or flipping such a repo. + +## This applies to ALL content: +- Source code, comments, error messages +- Documentation, RFCs, changelogs, roadmaps +- Commit messages +- Test data and examples diff --git a/.claude/rules/internal-docs-boundary.md b/.claude/rules/internal-docs-boundary.md new file mode 100644 index 0000000..193f36d --- /dev/null +++ b/.claude/rules/internal-docs-boundary.md @@ -0,0 +1,37 @@ +--- +paths: + - "**/*" +--- + +# Internal Documentation Boundary + +## These belong in the repo (public): +- Source code, tests, build config +- **Decision records** — `ADR-NNN` in `docs/decisions/` (the repo's design-record) +- Methodology documents (`methodology/`) +- Example files (`examples/`) — with fabricated data only +- User-facing docs: README, GETTING_STARTED, API, CONTRIBUTING, CHANGELOG +- Public design specs (`specs/`) — if intentionally public +- Templates (`templates/`) + +## These do NOT belong in the repo (→ project's Nextcloud internal-documentation): +- Handoff documents (`type: handoff`, HANDOFF_*, *_HANDOFF_*) +- **Ideas** (`type: idea`, `docs/ideas/`) — quick internal captures +- **RFCs** (`type: rfc`) — design proposals; internal until ratified, then they become an ADR in `docs/decisions/` +- **Explorations and shapes** (`type: exploration` / `type: shape`) — strategic deliberation +- Internal planning docs / plans (CODE_HANDOFF_*) +- Development notes (`notes/`, `_internal/`) +- Chat history or session exports +- Process memos from actual research projects +- Files from Nextcloud, Dropbox, OneDrive or other external sync services + +## If you are about to create or edit a file: +- Is this something a user who clones the repo needs? → Repo +- Is this internal planning, handoff, or development thinking? → NOT repo (use `save_document(doc_type=…)` → Nextcloud) + +## The document model (ratified 2026-06-21) +Repo = **ADR** (the ratified decision-record) + code / methodology / examples / user-docs. ALL +deliberation (idea / rfc / exploration / shape / handoff) is **internal** → the project's Nextcloud +`_internal_documentation/` (routed via `save_document(doc_type=…)`). Enforced deterministically +by `internal_docs_guard` (gate on a doc's frontmatter `type:`): unambiguously-internal types are blocked +from the repo (git pre-commit), optional-public (rfc/plan/todo/spec) only warn. diff --git a/.claude/rules/publish-readiness.md b/.claude/rules/publish-readiness.md new file mode 100644 index 0000000..d333b31 --- /dev/null +++ b/.claude/rules/publish-readiness.md @@ -0,0 +1,106 @@ +--- +paths: + - "**/*" +--- + +# Publish Readiness — Pre-publish checklist + +This rule defines what makes a repository ready to flip from private to public. Used by the `/publish-check` slash-command. + +## Severity rubric + +### Blocker — MUST fix before public + +A finding that, if left in, exposes personal data, breaks user trust, or makes the repository misleading. Public-flip is unsafe until resolved. + +Examples: +- Personal names, hardcoded user-home paths, e-mail addresses +- README claims that contradict source code (false advertising) + +### Warning — SHOULD fix before public + +A finding that signals carelessness or inconsistency to public readers. Public-flip is possible but degrades reception. + +Examples: +- British-English drift in user-facing prose +- Missing community files referenced from README +- Outdated supported-versions in `SECURITY.md` + +### Nice-to-have — MAY add or improve + +A finding that, if added, increases professionalism but is not expected by readers. + +Examples: +- `CODE_OF_CONDUCT.md` +- `.github/dependabot.yml` +- Issue / pull-request templates + +## Scan axes + +The `/publish-check` command runs five scans: + +1. **data-protection** — sources truth from `data-protection.md` rule (user-home paths, personal names, e-mail addresses) +2. **language** — sources truth from `language-british-english.md` rule (American-English drift in prose) +3. **docs-freshness** — `README`, `ROADMAP`, `SECURITY` versions and counts vs the source (`package.json`, `src/`) +4. **release-hygiene** — community files exist and are current +5. **readme-sections** — README carries the golden-standard mandatory sections (see "README golden standard" below); sources truth from `readme_check.py` + +## README golden standard + +Every project README must clear one bar: **a newcomer understands what the project is +within the first ~15 lines** — the situation in plain language, before any architecture, +philosophy or jargon, defining terms the first time they are used. Complete structure is +not enough; comprehension is the test. + +The canonical template lives in ACDM at `templates/README.template.md`. It is a *reference*, +**not** seeded into projects (per ADR-016 + the doc-model: `templates/` is a repo-side +artifact ACDM owns; `init_project` distributes enforcement, not content scaffold). Copy its +structure when writing or revising a README. + +**Mandatory sections** (enforced by Scan 5 / `readme_check.py`): + +- **What is ``?** — the plain-language on-ramp. +- **Development status** (or **Status and maturity**) — honest maturity; early publication + is fine, overclaiming is not. +- **Data & privacy** — mandatory *only* when the tool touches personal data (human + judgement; deliberately not auto-checked). + +Recommended (not auto-enforced): ecosystem block (if part of a family), "who is this for?" +doors, how it works, Documentation, Requirements, Licence, Support, Acknowledgements. See +the template for the full shape and per-section guidance. + +## Out of scope (v1) + +- Auto-fix (report-only) +- Continuous-integration enforcement +- Pre-commit hook integration +- Security review (`/security-review` — separate skill) +- Code-quality review (`/simplify` — separate skill) +- README *quality* / textual review — does the prose actually communicate? (manual pass, or the `doc-reviewer` agent; Scan 5 checks section *presence*, not quality) +- INSTALL / LICENSE textual review (manual pass required) +- Version-bump decisions (project-internal) + +These are documented as v2 promotions or out-of-tool concerns. + +## Consuming the report safely + +When `/publish-check` produces findings and you start fixing them: + +1. **Verify the working tree is clean first** — `git status` shows no untracked or modified files you didn't expect. After `init_project(update=True)` the disk holds new files not yet visible to git. +2. **Stage explicit per finding** — `git add `, not `git add -A`. The `-A` form picks up unrelated upstream drift. +3. **Verify the diff per file before commit** — `git diff --staged `. +4. **Be extra careful immediately after `init_project --update`** — distributed templates may overwrite earlier per-project fixes; the report you ran against may not reflect the disk state. + +Context: Teacher_MCP PR #61 (2026-05-05) used `git add -A` against undetected upstream drift and introduced 2 new BE-drift findings while fixing 9. Explicit staging would have prevented this. + +## Building the public artifact (fresh-repo flips) + +When the flip strategy is a fresh repository (no carried-over git history), the published repo is *built* from the working tree through an include/exclude step. Three principles keep that build trustworthy: + +1. **Verify the built artifact, not the working tree.** A scan against the source tree never tests the include/exclude list itself — a file the list fails to exclude still sits in the tree the scan passed. Build the fresh repo into a staging location, then run the publish scans against *that*, before publishing. The artifact is what readers get; the artifact is what you verify. (This is distinct from "verify the working tree is clean" above: that guards the fixing step; this guards the published output.) +2. **Allowlist what ships; do not denylist what doesn't.** Start the fresh repo from empty and copy in only named paths. A denylist (copy everything, minus exclusions) fails open — anything you forget to list is published. An allowlist fails closed. +3. **Run the gate as a reproducible script against a committed checkpoint.** A single ad-hoc grep pass is not a gate — globs and mounts misfire silently. Commit the prep work to the still-private branch first, so there is an auditable diff and a stable state to build from, then run the scan as a script the human can re-run. + +A token grep (names, course codes) finds known strings; it cannot find sensitive content that lacks them (personal reflections, opinions about colleagues, self-flagged private documents). Where shipping files carry a privacy field in front-matter (e.g. `privacy: private`), treat that field — not a name grep — as the primary ship / no-ship filter. + +Context: the Teacher_MCP private→public flip (2026-05-27) scanned the working tree before the fresh-repo build, leaving the include/exclude list unverified; an independent grep pass misfired (wrong glob, slow mount) before being corrected; and a self-flagged `privacy: private` document was caught only by chance through a name grep. diff --git a/.gitignore b/.gitignore index 49d9b65..f071327 100644 --- a/.gitignore +++ b/.gitignore @@ -66,7 +66,15 @@ coverage/ .env.local *.local -# ACDM-specific (personal tooling, not project content) -.claude/ +# ACDM config — gitignored (local / path-sensitive). EXCEPT the repo-policy rules +# below, which describe the repo / are read by repo-tooling and must travel with a +# clone (ADR-015 "protection travels with the repo"). Process rules + acdm.json + +# .mcp.json + CLAUDE.md stay ignored. +.claude/* +!.claude/rules/ +.claude/rules/* +!.claude/rules/data-protection.md +!.claude/rules/internal-docs-boundary.md +!.claude/rules/publish-readiness.md .mcp.json CLAUDE.md