Context
document_shape is probably the most important raw-source inspector for the onboarding path: it should help a user understand which files appear to belong together as candidate collections. Today its generic classes / outliers output can be technically accurate but still fail to tell a coherent story.
For katalyst inspect ., the reader should come away believing Katalyst saw the actual documents, inferred useful similarities, and can point to the evidence.
Goal
Make document_shape answer, at a glance:
- What candidate document groups or collections did Katalyst see?
- What makes each group coherent: frontmatter keys, body sections, naming, location, or file type?
- How many files are in each group?
- Which representative files belong to each group?
- Which documents are exceptions, and why?
- Are there too few documents to infer a meaningful shape?
Possible shape
Render candidate groups as named or numbered groups with a short explanation and representative members. Translate feature tokens into user-facing evidence, for example:
frontmatter: title, status
sections: Review
naming: kebab-case markdown files
Outliers should be framed as exceptions to an understandable pattern, not as the primary organizing principle.
Acceptance criteria
katalyst inspect . --inspector document_shape presents candidate document groups in a way humans can scan and AI agents can cite.
- Each group includes count, concrete representative paths, and the visible evidence behind the grouping.
- Outliers include concrete paths and an explanation of what differs.
- Very small inputs avoid overclaiming; the output can say there is not enough evidence to infer a stable document shape.
- Existing JSON output remains complete and parseable; any schema changes are intentional and covered by tests.
- Snapshot tests cover a coherent collection, a mixed directory, and a tiny directory.
Context
document_shapeis probably the most important raw-source inspector for the onboarding path: it should help a user understand which files appear to belong together as candidate collections. Today its genericclasses/outliersoutput can be technically accurate but still fail to tell a coherent story.For
katalyst inspect ., the reader should come away believing Katalyst saw the actual documents, inferred useful similarities, and can point to the evidence.Goal
Make
document_shapeanswer, at a glance:Possible shape
Render candidate groups as named or numbered groups with a short explanation and representative members. Translate feature tokens into user-facing evidence, for example:
frontmatter: title, statussections: Reviewnaming: kebab-case markdown filesOutliers should be framed as exceptions to an understandable pattern, not as the primary organizing principle.
Acceptance criteria
katalyst inspect . --inspector document_shapepresents candidate document groups in a way humans can scan and AI agents can cite.