Skip to content

[repo] Review by the simian: strong ideas, packaged for the wrong shelf #3

Description

@the-simian

I read the whole repository twice, once as a human and once with an agent reading it, and spent a good while with it. This is repo-wide feedback, separate from #1. Up front, and I mean this: this is a good set of markdown files with real ideas in it. The thing I keep running into isn't the ideas; it's that they're packaged in a way that doesn't play to their strengths. None of what follows is "this is wrong"; it's "this is good, and here's how it could land as what it actually is." When the packaging and a few model refinements settle, this is something I'd genuinely consider adopting in my own org, and I'd want to watch how it behaves at scale. It's an interesting idea and it deserves real, academic-style criticism, so that's the spirit here.

TL;DR, do this. Rename the project around governance and inheritance, not "multi-agent planning." Lead each document with its recommendation, and move the two scans into a drill-down reference/ locker. Encode Origin and Mode as enums, not prose. Reference the canonical from meta-docs instead of re-summarizing it. Make the lifecycle a property pair (deprecated_on, superseded_by) and warehouse retired principles. Everything below is the evidence and reasoning for these.

What works, and is worth amplifying

  • The architecture is sound. Separating canonical principles from the agents that consume them is the right call. (docs/principles/ as the canon; docs/agents/PLANNING_AGENT_PROMPT_GUIDELINES.md as a consumer.)
  • The Origin/Mode distinction is a real contribution. Separating where a principle comes from from how you consume it is the clever bit, and your own related-work scan is right that it's the novel part. (docs/model.md, "Origin vs. Mode.")
  • It dogfoods its own structure (principles stored as principles, carried by the model). The places it doesn't yet are in Part B.
  • Reference-as-default is a genuinely good, uncommon choice; live resolution with drift-as-normal beats copy-and-install for maintenance. (docs/model.md, "Inheritance modes.")

Part A: name it for what it is, and order it like it

1. The title shelves it in the wrong aisle. "Multi-agent planning" drops this into the most crowded lane in the field, where a reader expects orchestration (spawning, arenas, parallel agents) or a planning methodology, and finds neither. But your own one-line description already says what it really is: "a system for sharing agent-coordination disciplines across projects" (README.md, first line; docs/model.md, line 3). That is a multi-project discipline-governance system, and that is probably the less-explored space; it is also the more interesting one. Most published multi-agent work is orchestration mechanics: one operator running many parallel agents, arenas, parallelization, merge strategies. This project is doing something orthogonal; it centers on the principles and axioms that downstream implementations inherit. That is org-wide idea-centralization, not one-person-many-parallel-agents, and it is the part the current title hides. Don't call it multi-agent planning like everyone else; call it what it is, and you move from outgunned to genuinely differentiated. The four principles and the PAG then become two instances you dogfood on top of the system (principles that use it; one planner agent that uses it). Rename around adoption, governance, and inheritance, and every artifact gets an obvious role.

2. Lead with the claim and the instructions; put evidence behind them. Right now these read like persuasive essays, not instructions: the granular, actionable recommendations are concentrated at the end, and a reader should not have to consume the whole essay to extract them. Invert it: lead with the recommendations, up front and succinct, and put the evidence and citations behind them for drill-down. The worst offenders are the two scans (docs/related-work.md, docs/principles-prior-art.md), which are almost all evidence with the actionable conclusions in their final sections; move those into a reference/ locker disclosed on demand. And yes, I see that both scans open with an executive summary; that is not the lead I mean. An executive summary states findings, where a lead should state actions: "the inheritance model is the novel contribution" is a conclusion, "rename around governance, shelve the scans for drill-down" is an instruction. A summary of what you found still makes the reader work to extract what to do; leading with the behavior to take does not, which is why it is the stronger opening. The README's job is to convince a reader the governance model is worth adopting, then show how to use it, then support it with evidence, in that order. The "Why a model, not just a list of principles" section (docs/model.md) is a symptom: it argues a case where it should just present the model. Don't sell me on it being a model; show me the model.

3. Encode the model as types, not prose. Most of these documents spend a lot of words on a few ideas, which is cognitive load, not just length. "The model factors these concerns into three layers. Each layer has a clear role; the boundaries between them are explicit" (docs/model.md) is just "the governance model is three explicitly bounded layers." Better: define the model as explicit types. Origin is an enum of exactly {upstream, adapted-upstream, self}; Mode is exactly {reference, adapted}. Stated as types, you don't need prose explaining they don't overlap, because types don't overlap. This also does double duty with point 5: typed definitions can't collide; prose ones can. (docs/model.md, "Inheritance modes" and "The operator's registry.")

4. Reference your canonical; don't summarize it, because you preach exactly this. Reference mode's whole pitch is "read the canonical, no local copy, drift eliminated" (docs/model.md, "Inheritance modes"). But docs/agents/PLANNING_AGENT_PROMPT_GUIDELINES.md summarizes all four principles inline and links them (its "Disclosure Is Not Correction," "Hard Evidence," "Documentation-First," and "State Preconditions" sections), and the scans summarize them too. Each summary is a second copy of the canonical's load-bearing claim with nothing keeping the two in sync; the Disclosure section paraphrases the canonical's "coordination state IS git state" thesis almost verbatim (compare PLANNING_AGENT_PROMPT_GUIDELINES.md, "Disclosure Is Not Correction," against docs/principles/disclosure_is_not_correction.md). That costs you twice, and both costs are avoidable: it drifts (the moment the #1 disclosure change lands, the main-coupled paraphrase goes stale), and it's dead weight (the summary spends words re-saying what a link would resolve). Referencing buys both back at once; call it referential DRYness, the principle stated once, read live, with no second copy to drift or to maintain. The planning-agent-specific guidance in those sections is net-new and stays; only the canonical paraphrase should collapse to a link. Rule: instructions and meta-docs reference the canonical, never re-summarize it; where a genuine handle is needed, use point 6's skill-frontmatter shape (name, one-line descriptor, path), which carries no thesis to go stale. (Points 2, 4, and 6 compose: one home, right order, one copy, flyweight handles only.)

5. Make the governance model first-class, with an agent that guards coherence. A governance system has three legs: policy (your principles), a record (the registry, docs/model.md, "The operator's registry"), and enforcement, which is currently missing. Nothing checks whether a new principle overlaps an existing one, and your own scan already flags candidates that do (docs/principles-prior-art.md, "Candidate new principles"; e.g. Closed-Loop Completion overlapping State Preconditions). So treat the model as first-class: authoring a new principle or agent runs through a coherence check (non-overlap, principle-vs-principle and agent-vs-agent; reference-not-restate). At this size that's a convention; at the multi-tenant scale the model targets, it's an agent whose whole job is that check, and it would be a third dogfooded instance, the strongest demonstration that the system is content-agnostic. To be clear what this is and isn't: this checker is framework tooling, run at authoring time the way a linter runs in CI, and excluded from day-to-day ingestion. It is not a principle loaded by reference every session, and it is not the adopter-side runtime the README's "no tooling to maintain" disclaims; that line is about discipline content shipping as plain markdown with a bring-your-own model. It's a linter for the framework itself, and it would already earn its keep: it would flag that the PAG's "Disclosure Is Not Correction" section summarizes the canonical inline where a link would do.

Part B: model refinements to pressure-test before scale

These are deeper than packaging, and they're where I'd most want to see real behavior at scale. Grouped because they share a root concern: context and lifecycle as the document set grows.

6. Reference mode risks context explosion; load an index, not the full text. Reference mode reads canonical at session start (docs/model.md, "Inheritance modes"; docs/tenant-onboarding.md, Phase C). At four short principles that's nothing; at the scale this is built for (many inherited, verbose principles), you eagerly spend a large slice of the context budget every session whether or not the work needs them. The fix is a flyweight: load a registry of name + one-line summary + path, and lazy-load full canonical only when relevant, the same way agent skills load a short frontmatter description and pull the body on demand. Combined with the verbosity in point 3, this is the most likely thing to bite at scale.

7. If adapted mode is templating, make it rigorous templating. Adapted mode is currently manual re-authoring with a SHA pin (docs/model.md, "Inheritance modes"; docs/tenant-onboarding.md, Phases D and F). Its repeated failure mode, in your own words, is "bumping the SHA without re-translating content" (docs/tenant-onboarding.md, "Common pitfalls"; docs/operator-onboarding.md, "Public canonical updates"). That failure exists because adaptation is manual. Formalize it as a template: canonical carries front matter marking what's templatable, adaptation renders rather than hand-rewrites, with a worked example. Re-rendering a template can't silently drift the way manual re-translation does. Worth noting: only adapted mode has this churn at all. Reference-mode consumers are auto-current by definition, next session they're up to date, so migration cost is an adapted-mode-only concern, which narrows where this matters.

8. The lifecycle model is lossy; use properties and a warehouse, not a status enum. The registry's Status: {active, superseded, deprecated} (docs/model.md, "The operator's registry") conflates two orthogonal facts: a lifecycle bit (retired or not) and a relationship (is there a successor). It can record neither when something was deprecated nor where its successor lives, because there is no date column and no successor column. Cleaner and lossless:

  • deprecated_on: date | null (null means active);
  • superseded_by: location | null (the redirect; null means retired with no replacement).

superseded and deprecated then collapse correctly: both are "retired," superseded just carries a pointer. And rather than a status flag, warehouse retired principles: move them out of the active registry to a separate location (still readable for stragglers mid-migration, so nothing breaks), so liveness is structural ("if it's not in the warehouse, it's active") and a person or agent scanning the registry sees only live content instead of opening each doc to discover it's dead. The current deprecation handling (docs/operator-onboarding.md, "Public canonical removal or deprecation") gestures at eventual removal but leaves dead docs in-list until then; the warehouse makes the active set self-evidently the active set.

Bottom line

This is a strong idea, built well, labeled for the wrong shelf. Rename it to the governance and inheritance system it actually is, lead with the recipe and shelve the provenance, reference your own canon instead of summarizing it, and tighten the model so it stays light at scale. Do that and I would put it in front of my own org and watch how it holds up. Happy to dig into any of these further.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions