Skip to content

docs: adopt agent-neutral AGENTS.md + REVIEW_GUIDE.md pattern#66

Merged
Marius1311 merged 3 commits intomainfrom
docs/agent-neutral-docs
Apr 27, 2026
Merged

docs: adopt agent-neutral AGENTS.md + REVIEW_GUIDE.md pattern#66
Marius1311 merged 3 commits intomainfrom
docs/agent-neutral-docs

Conversation

@Marius1311
Copy link
Copy Markdown
Member

Summary

Adopt the agent-neutral documentation pattern from avanti (and
sinkhorn_embeddings): split repo guidance into an authoritative
AGENTS.md at the root, a review-only REVIEW_GUIDE.md for
GitHub-launched review agents, a thin CLAUDE.md entry point, and a
structured PR template. Shrink .github/copilot-instructions.md to a
shim that points to the two canonical docs.

Every invariant in AGENTS.md was verified against the actual source
(not the previous copilot-instructions.md, which contained outdated
claims like _predicted/_confidence postfixes and "mapping matrices
are typically dense"). Tunable thresholds
(SKLEARN_WARNING_CUTOFF, SPECTRAL_METHOD_THRESHOLD) are not
duplicated — they live in src/cellmapper/constants.py and
AGENTS.md's "Where To Find What" table points there.

Behavior Or Invariants Changed

None. Documentation-only change.

Tests Run

None required — docs-only. pre-commit passes on all edited files.

Reviewer Focus

  • AGENTS.md "Critical Invariants" section: each bullet has a
    traceable source (file / method name). Verified against:
    • cellmapper.py self-mapping activation + read-only reference.
    • mapping_operator.py _validate_and_normalize_mapping_matrix
      (row-stochastic, float32; sparse→CSR / dense→ndarray) and
      _validate_power (t > 1 self-mapping-only).
    • constants.py kernel taxonomy + SELF_MAPPING_ONLY_KERNELS.
    • check.py::check_deps fail-fast with install hints.
    • neighbors.py::__post_init__ self-edge stripping.
    • utils.py::create_imputed_anndata signature and output layout.
    • __init__.py __all__ = public API surface.
  • REVIEW_GUIDE.md "High-Risk Areas": pointers only, no restatement
    of invariants (same deduplication approach as sinkhorn_embeddings).

Context

This follows the same structure already in avanti and
sinkhorn_embeddings, with two explicit properties:

  1. REVIEW_GUIDE.md is self-contained for GitHub-launched review
    agents — no references to local user settings, personal workflow
    rules, or machine-specific paths.
  2. AGENTS.md delegates to existing docs (README.md, docs/api.md,
    docs/notebooks/tutorials/, docs/contributing.md, source
    docstrings) rather than duplicating their content. Only information
    that does not have another owner lives in AGENTS.md.

Open Questions Or Follow-Ups

  • Sibling repos (cell-annotator, cellrank) could follow the same
    pattern in separate PRs if useful.

Mirror the agent-neutral documentation structure from avanti:

- AGENTS.md — canonical source of truth for agents: trust order,
  "Where To Find What" table delegating to existing docs, 12 critical
  invariants sourced directly from the current source code, and
  development commands.
- REVIEW_GUIDE.md — standalone PR review guide for GitHub-launched
  agents. No references to local user settings or machine-specific
  paths. Covers review-first workflow, high-risk areas, a changed-path
  test-lookup table, testing/docs-impact checklists, and a review
  checklist.
- CLAUDE.md — thin entry point that includes AGENTS.md and points to
  REVIEW_GUIDE.md.
- .github/PULL_REQUEST_TEMPLATE.md — structured template matching
  avanti's: summary, behavior/invariants changed, tests run, reviewer
  focus, context, open questions.
- .github/copilot-instructions.md — shrunk to a 5-line shim pointing
  to AGENTS.md and REVIEW_GUIDE.md. The previous long-form content is
  either captured in AGENTS.md (as verified invariants) or delegated
  to existing docs (README.md, docs/api.md, docs/notebooks/tutorials/,
  docs/contributing.md) — no duplication.
Audit pass after the initial port. Two concerns:

1. AGENTS.md had hardcoded values ("sklearn warns above 50k cells",
   "warning suggests spectral when t > 10") that actually live in
   `PackageConstants.SKLEARN_WARNING_CUTOFF` and
   `SPECTRAL_METHOD_THRESHOLD`. Encoding them here means two places
   to update and silent drift. Replaced with a pointer to
   `src/cellmapper/constants.py` in the "Where To Find What" table;
   dropped the hardcoded numbers.
2. REVIEW_GUIDE.md's "High-Risk Areas" restated 7 of 12 invariants
   from AGENTS.md almost verbatim. Same unmaintainability problem as
   sinkhorn_embeddings: every invariant change needs two edits.

Changes:

- AGENTS.md: collapse "Iterative mode preserves sparsity; spectral
  returns dense" (implementation detail — stays in the
  mapping_operator docstring). Drop specific integration test file
  names from the tests invariant (they live in REVIEW_GUIDE's
  changed-path table). Add constants.py to the "Where To Find What"
  row so the SKLEARN_WARNING_CUTOFF / SPECTRAL_METHOD_THRESHOLD
  thresholds have a discoverable home.
- REVIEW_GUIDE.md: rewrite "High-Risk Areas" as file-path pointers
  with a short hazard, not restatements. Add explicit "Do not
  restate them here — link" note at the top. Trim the checklist and
  docs-impact sections to match. Net: 140 -> 83 lines.

Every surviving claim in AGENTS.md was checked against the actual
source:
- Self-mapping detection: `cellmapper.py` lines 41-48.
- Row-stochastic CSR float32: `mapping_operator.py`
  `_validate_and_normalize_mapping_matrix`, lines 206-211.
- `t > 1` self-mapping gate: `_validate_power`, lines 215-221.
- `check_deps` fail-fast: `check.py::check_deps`, line 96.
- Kernel taxonomy: `constants.py` lines 12-17.
- `Neighbors` self-edge stripping: `neighbors.py::__post_init__`
  line 72.
- `create_imputed_anndata` signature: `utils.py` line 17.
- Public API: `src/cellmapper/__init__.py` `__all__`.
- Mapping matrix is not always CSR. `_validate_and_normalize_mapping_matrix`
  returns CSR for sparse inputs and ndarray for dense inputs; both are
  float32, both row-stochastic. Replace "row-stochastic CSR" with the
  accurate "row-stochastic and float32. Sparse inputs are stored as CSR;
  dense inputs stay dense."
- Postfix claim was inaccurate. Only `prediction_postfix` is exposed on
  `.map()`; `confidence_postfix` is only on the per-method entrypoints
  (`map_obs`, `map_obsm`). Rephrase accordingly.
- Python version line was vague ("3.11 and newer"). Match the actual
  `hatch-test` matrix (3.11 and 3.14) and point readers to pyproject.toml.

No information lost from the original `.github/copilot-instructions.md`
that isn't already covered by delegation:
- Domain context / joint-embedding requirement → README.md.
- Ruff / docstring conventions → pyproject.toml + docs/contributing.md.
- `hatch-vcs` / semver release → docs/contributing.md.
- Outdated bits (mapping matrices "typically dense", `_predicted`/
  `_confidence` postfixes) are correctly dropped — the current code says
  sparse CSR / `_pred` / `_conf`.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.37%. Comparing base (1b3f5d6) to head (0cd9555).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #66   +/-   ##
=======================================
  Coverage   86.37%   86.37%           
=======================================
  Files          13       13           
  Lines        1387     1387           
=======================================
  Hits         1198     1198           
  Misses        189      189           
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Marius1311 Marius1311 merged commit ea38130 into main Apr 27, 2026
9 checks passed
@Marius1311 Marius1311 deleted the docs/agent-neutral-docs branch April 27, 2026 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant