Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions .claude/rules/data-protection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
paths:
- "**/*"
---

# Data Protection — Treat As If Public

Whether or not this repo is private today, treat everything in it as if it were
already public. A private repo can be made public, forked, cloned, or leaked, and
anything committed is permanent. The only safe assumption is that every file and
every past commit is visible to the world.

## Hard rule (non-negotiable)

Real person or student data must **NEVER** exist in this repo — or any repo — in
any form, anywhere in the working tree **or its git history**. Prevention is the
only safe path: once committed it lives in the history forever (see
"Already committed?" below).

## NEVER write in files or commit messages:
- Personal names (colleagues, research participants, teachers, students)
- School names or abbreviations that identify specific schools
- University or institution names
- Research programme names (funded projects, grants)
- Place names (streets, buildings, venues) that identify locations
- Hardcoded file paths containing usernames (`/Users/...`, `/home/...`)
- Research questions specific enough to identify a study
- Chat history or session transcripts
- Secrets: API keys, tokens, passwords, `.env` contents, credentials

## ALWAYS use instead:
- `School A`, `School B`, `Colleague_A` for anonymised references
- `/path/to/project` for file path examples
- `SPEAKER_01`, `L1` for participant references
- Generic descriptions for research programmes
- Synthetic/fabricated data in examples

## Check before committing:
- Think before writing — does this text contain any personal names, paths, or identifiers?
- Would a reader identify a specific person, school, or study from this text —
**directly**, OR by combining quasi-identifiers (e.g. class + date + subject can
identify a student without naming them)?
- This is your judgement. The `pii_scan` commit gate is the deterministic backstop —
it catches what you miss, but it is not a substitute for the check above.

## Already committed? Deletion is NOT enough.

If real data is found already in the repo, removing the file in a new commit does
**not** remove it from git history — it remains in every past commit, clone, and
fork. To actually remove it you must scrub the history (fresh-repo rebuild or a
history filter) **and** rotate any exposed secret. Stop and escalate before
publishing or flipping such a repo.

## This applies to ALL content:
- Source code, comments, error messages
- Documentation, RFCs, changelogs, roadmaps
- Commit messages
- Test data and examples
37 changes: 37 additions & 0 deletions .claude/rules/internal-docs-boundary.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
paths:
- "**/*"
---

# Internal Documentation Boundary

## These belong in the repo (public):
- Source code, tests, build config
- **Decision records** — `ADR-NNN` in `docs/decisions/` (the repo's design-record)
- Methodology documents (`methodology/`)
- Example files (`examples/`) — with fabricated data only
- User-facing docs: README, GETTING_STARTED, API, CONTRIBUTING, CHANGELOG
- Public design specs (`specs/`) — if intentionally public
- Templates (`templates/`)

## These do NOT belong in the repo (→ project's Nextcloud internal-documentation):
- Handoff documents (`type: handoff`, HANDOFF_*, *_HANDOFF_*)
- **Ideas** (`type: idea`, `docs/ideas/`) — quick internal captures
- **RFCs** (`type: rfc`) — design proposals; internal until ratified, then they become an ADR in `docs/decisions/`
- **Explorations and shapes** (`type: exploration` / `type: shape`) — strategic deliberation
- Internal planning docs / plans (CODE_HANDOFF_*)
- Development notes (`notes/`, `_internal/`)
- Chat history or session exports
- Process memos from actual research projects
- Files from Nextcloud, Dropbox, OneDrive or other external sync services

## If you are about to create or edit a file:
- Is this something a user who clones the repo needs? → Repo
- Is this internal planning, handoff, or development thinking? → NOT repo (use `save_document(doc_type=…)` → Nextcloud)

## The document model (ratified 2026-06-21)
Repo = **ADR** (the ratified decision-record) + code / methodology / examples / user-docs. ALL
deliberation (idea / rfc / exploration / shape / handoff) is **internal** → the project's Nextcloud
`<Project>_internal_documentation/` (routed via `save_document(doc_type=…)`). Enforced deterministically
by `internal_docs_guard` (gate on a doc's frontmatter `type:`): unambiguously-internal types are blocked
from the repo (git pre-commit), optional-public (rfc/plan/todo/spec) only warn.
106 changes: 106 additions & 0 deletions .claude/rules/publish-readiness.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
paths:
- "**/*"
---

# Publish Readiness — Pre-publish checklist

This rule defines what makes a repository ready to flip from private to public. Used by the `/publish-check` slash-command.

## Severity rubric

### Blocker — MUST fix before public

A finding that, if left in, exposes personal data, breaks user trust, or makes the repository misleading. Public-flip is unsafe until resolved.

Examples:
- Personal names, hardcoded user-home paths, e-mail addresses
- README claims that contradict source code (false advertising)

### Warning — SHOULD fix before public

A finding that signals carelessness or inconsistency to public readers. Public-flip is possible but degrades reception.

Examples:
- British-English drift in user-facing prose
- Missing community files referenced from README
- Outdated supported-versions in `SECURITY.md`

### Nice-to-have — MAY add or improve

A finding that, if added, increases professionalism but is not expected by readers.

Examples:
- `CODE_OF_CONDUCT.md`
- `.github/dependabot.yml`
- Issue / pull-request templates

## Scan axes

The `/publish-check` command runs five scans:

1. **data-protection** — sources truth from `data-protection.md` rule (user-home paths, personal names, e-mail addresses)
2. **language** — sources truth from `language-british-english.md` rule (American-English drift in prose)
3. **docs-freshness** — `README`, `ROADMAP`, `SECURITY` versions and counts vs the source (`package.json`, `src/`)
4. **release-hygiene** — community files exist and are current
5. **readme-sections** — README carries the golden-standard mandatory sections (see "README golden standard" below); sources truth from `readme_check.py`

## README golden standard

Every project README must clear one bar: **a newcomer understands what the project is
within the first ~15 lines** — the situation in plain language, before any architecture,
philosophy or jargon, defining terms the first time they are used. Complete structure is
not enough; comprehension is the test.

The canonical template lives in ACDM at `templates/README.template.md`. It is a *reference*,
**not** seeded into projects (per ADR-016 + the doc-model: `templates/` is a repo-side
artifact ACDM owns; `init_project` distributes enforcement, not content scaffold). Copy its
structure when writing or revising a README.

**Mandatory sections** (enforced by Scan 5 / `readme_check.py`):

- **What is `<Project>`?** — the plain-language on-ramp.
- **Development status** (or **Status and maturity**) — honest maturity; early publication
is fine, overclaiming is not.
- **Data & privacy** — mandatory *only* when the tool touches personal data (human
judgement; deliberately not auto-checked).

Recommended (not auto-enforced): ecosystem block (if part of a family), "who is this for?"
doors, how it works, Documentation, Requirements, Licence, Support, Acknowledgements. See
the template for the full shape and per-section guidance.

## Out of scope (v1)

- Auto-fix (report-only)
- Continuous-integration enforcement
- Pre-commit hook integration
- Security review (`/security-review` — separate skill)
- Code-quality review (`/simplify` — separate skill)
- README *quality* / textual review — does the prose actually communicate? (manual pass, or the `doc-reviewer` agent; Scan 5 checks section *presence*, not quality)
- INSTALL / LICENSE textual review (manual pass required)
- Version-bump decisions (project-internal)

These are documented as v2 promotions or out-of-tool concerns.

## Consuming the report safely

When `/publish-check` produces findings and you start fixing them:

1. **Verify the working tree is clean first** — `git status` shows no untracked or modified files you didn't expect. After `init_project(update=True)` the disk holds new files not yet visible to git.
2. **Stage explicit per finding** — `git add <file>`, not `git add -A`. The `-A` form picks up unrelated upstream drift.
3. **Verify the diff per file before commit** — `git diff --staged <file>`.
4. **Be extra careful immediately after `init_project --update`** — distributed templates may overwrite earlier per-project fixes; the report you ran against may not reflect the disk state.

Context: Teacher_MCP PR #61 (2026-05-05) used `git add -A` against undetected upstream drift and introduced 2 new BE-drift findings while fixing 9. Explicit staging would have prevented this.

## Building the public artifact (fresh-repo flips)

When the flip strategy is a fresh repository (no carried-over git history), the published repo is *built* from the working tree through an include/exclude step. Three principles keep that build trustworthy:

1. **Verify the built artifact, not the working tree.** A scan against the source tree never tests the include/exclude list itself — a file the list fails to exclude still sits in the tree the scan passed. Build the fresh repo into a staging location, then run the publish scans against *that*, before publishing. The artifact is what readers get; the artifact is what you verify. (This is distinct from "verify the working tree is clean" above: that guards the fixing step; this guards the published output.)
2. **Allowlist what ships; do not denylist what doesn't.** Start the fresh repo from empty and copy in only named paths. A denylist (copy everything, minus exclusions) fails open — anything you forget to list is published. An allowlist fails closed.
3. **Run the gate as a reproducible script against a committed checkpoint.** A single ad-hoc grep pass is not a gate — globs and mounts misfire silently. Commit the prep work to the still-private branch first, so there is an auditable diff and a stable state to build from, then run the scan as a script the human can re-run.

A token grep (names, course codes) finds known strings; it cannot find sensitive content that lacks them (personal reflections, opinions about colleagues, self-flagged private documents). Where shipping files carry a privacy field in front-matter (e.g. `privacy: private`), treat that field — not a name grep — as the primary ship / no-ship filter.

Context: the Teacher_MCP private→public flip (2026-05-27) scanned the working tree before the fresh-repo build, leaving the include/exclude list unverified; an independent grep pass misfired (wrong glob, slow mount) before being corrected; and a self-flagged `privacy: private` document was caught only by chance through a name grep.
12 changes: 10 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,15 @@ coverage/
.env.local
*.local

# ACDM-specific (personal tooling, not project content)
.claude/
# ACDM config — gitignored (local / path-sensitive). EXCEPT the repo-policy rules
# below, which describe the repo / are read by repo-tooling and must travel with a
# clone (ADR-015 "protection travels with the repo"). Process rules + acdm.json +
# .mcp.json + CLAUDE.md stay ignored.
.claude/*
!.claude/rules/
.claude/rules/*
!.claude/rules/data-protection.md
!.claude/rules/internal-docs-boundary.md
!.claude/rules/publish-readiness.md
.mcp.json
CLAUDE.md
Loading