docs: adopt agent-neutral AGENTS.md + REVIEW_GUIDE.md pattern#66
Merged
Marius1311 merged 3 commits intomainfrom Apr 27, 2026
Merged
docs: adopt agent-neutral AGENTS.md + REVIEW_GUIDE.md pattern#66Marius1311 merged 3 commits intomainfrom
Marius1311 merged 3 commits intomainfrom
Conversation
Mirror the agent-neutral documentation structure from avanti: - AGENTS.md — canonical source of truth for agents: trust order, "Where To Find What" table delegating to existing docs, 12 critical invariants sourced directly from the current source code, and development commands. - REVIEW_GUIDE.md — standalone PR review guide for GitHub-launched agents. No references to local user settings or machine-specific paths. Covers review-first workflow, high-risk areas, a changed-path test-lookup table, testing/docs-impact checklists, and a review checklist. - CLAUDE.md — thin entry point that includes AGENTS.md and points to REVIEW_GUIDE.md. - .github/PULL_REQUEST_TEMPLATE.md — structured template matching avanti's: summary, behavior/invariants changed, tests run, reviewer focus, context, open questions. - .github/copilot-instructions.md — shrunk to a 5-line shim pointing to AGENTS.md and REVIEW_GUIDE.md. The previous long-form content is either captured in AGENTS.md (as verified invariants) or delegated to existing docs (README.md, docs/api.md, docs/notebooks/tutorials/, docs/contributing.md) — no duplication.
Audit pass after the initial port. Two concerns:
1. AGENTS.md had hardcoded values ("sklearn warns above 50k cells",
"warning suggests spectral when t > 10") that actually live in
`PackageConstants.SKLEARN_WARNING_CUTOFF` and
`SPECTRAL_METHOD_THRESHOLD`. Encoding them here means two places
to update and silent drift. Replaced with a pointer to
`src/cellmapper/constants.py` in the "Where To Find What" table;
dropped the hardcoded numbers.
2. REVIEW_GUIDE.md's "High-Risk Areas" restated 7 of 12 invariants
from AGENTS.md almost verbatim. Same unmaintainability problem as
sinkhorn_embeddings: every invariant change needs two edits.
Changes:
- AGENTS.md: collapse "Iterative mode preserves sparsity; spectral
returns dense" (implementation detail — stays in the
mapping_operator docstring). Drop specific integration test file
names from the tests invariant (they live in REVIEW_GUIDE's
changed-path table). Add constants.py to the "Where To Find What"
row so the SKLEARN_WARNING_CUTOFF / SPECTRAL_METHOD_THRESHOLD
thresholds have a discoverable home.
- REVIEW_GUIDE.md: rewrite "High-Risk Areas" as file-path pointers
with a short hazard, not restatements. Add explicit "Do not
restate them here — link" note at the top. Trim the checklist and
docs-impact sections to match. Net: 140 -> 83 lines.
Every surviving claim in AGENTS.md was checked against the actual
source:
- Self-mapping detection: `cellmapper.py` lines 41-48.
- Row-stochastic CSR float32: `mapping_operator.py`
`_validate_and_normalize_mapping_matrix`, lines 206-211.
- `t > 1` self-mapping gate: `_validate_power`, lines 215-221.
- `check_deps` fail-fast: `check.py::check_deps`, line 96.
- Kernel taxonomy: `constants.py` lines 12-17.
- `Neighbors` self-edge stripping: `neighbors.py::__post_init__`
line 72.
- `create_imputed_anndata` signature: `utils.py` line 17.
- Public API: `src/cellmapper/__init__.py` `__all__`.
- Mapping matrix is not always CSR. `_validate_and_normalize_mapping_matrix`
returns CSR for sparse inputs and ndarray for dense inputs; both are
float32, both row-stochastic. Replace "row-stochastic CSR" with the
accurate "row-stochastic and float32. Sparse inputs are stored as CSR;
dense inputs stay dense."
- Postfix claim was inaccurate. Only `prediction_postfix` is exposed on
`.map()`; `confidence_postfix` is only on the per-method entrypoints
(`map_obs`, `map_obsm`). Rephrase accordingly.
- Python version line was vague ("3.11 and newer"). Match the actual
`hatch-test` matrix (3.11 and 3.14) and point readers to pyproject.toml.
No information lost from the original `.github/copilot-instructions.md`
that isn't already covered by delegation:
- Domain context / joint-embedding requirement → README.md.
- Ruff / docstring conventions → pyproject.toml + docs/contributing.md.
- `hatch-vcs` / semver release → docs/contributing.md.
- Outdated bits (mapping matrices "typically dense", `_predicted`/
`_confidence` postfixes) are correctly dropped — the current code says
sparse CSR / `_pred` / `_conf`.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #66 +/- ##
=======================================
Coverage 86.37% 86.37%
=======================================
Files 13 13
Lines 1387 1387
=======================================
Hits 1198 1198
Misses 189 189 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adopt the agent-neutral documentation pattern from
avanti(andsinkhorn_embeddings): split repo guidance into an authoritativeAGENTS.mdat the root, a review-onlyREVIEW_GUIDE.mdforGitHub-launched review agents, a thin
CLAUDE.mdentry point, and astructured PR template. Shrink
.github/copilot-instructions.mdto ashim that points to the two canonical docs.
Every invariant in
AGENTS.mdwas verified against the actual source(not the previous
copilot-instructions.md, which contained outdatedclaims like
_predicted/_confidencepostfixes and "mapping matricesare typically dense"). Tunable thresholds
(
SKLEARN_WARNING_CUTOFF,SPECTRAL_METHOD_THRESHOLD) are notduplicated — they live in
src/cellmapper/constants.pyandAGENTS.md's "Where To Find What" table points there.Behavior Or Invariants Changed
None. Documentation-only change.
Tests Run
None required — docs-only.
pre-commitpasses on all edited files.Reviewer Focus
AGENTS.md"Critical Invariants" section: each bullet has atraceable source (file / method name). Verified against:
cellmapper.pyself-mapping activation + read-only reference.mapping_operator.py_validate_and_normalize_mapping_matrix(row-stochastic, float32; sparse→CSR / dense→ndarray) and
_validate_power(t > 1self-mapping-only).constants.pykernel taxonomy +SELF_MAPPING_ONLY_KERNELS.check.py::check_depsfail-fast with install hints.neighbors.py::__post_init__self-edge stripping.utils.py::create_imputed_anndatasignature and output layout.__init__.py__all__= public API surface.REVIEW_GUIDE.md"High-Risk Areas": pointers only, no restatementof invariants (same deduplication approach as sinkhorn_embeddings).
Context
This follows the same structure already in
avantiandsinkhorn_embeddings, with two explicit properties:REVIEW_GUIDE.mdis self-contained for GitHub-launched reviewagents — no references to local user settings, personal workflow
rules, or machine-specific paths.
AGENTS.mddelegates to existing docs (README.md,docs/api.md,docs/notebooks/tutorials/,docs/contributing.md, sourcedocstrings) rather than duplicating their content. Only information
that does not have another owner lives in
AGENTS.md.Open Questions Or Follow-Ups
cell-annotator,cellrank) could follow the samepattern in separate PRs if useful.