diff --git a/AGENTS.md b/AGENTS.md index 141aba29..896f5d99 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -26,8 +26,8 @@ Tests should always pass on `main`. Run `make test` before sending a PR. ``` cmd/ cobra commands (root, init, check, fix, inspect, collection, item, schema, rules) -internal/project project domain layer: the .katalyst/ loader (loader.go: schemas + storage instances, which embed their collections), the whole workspace, selectors, item enumeration -internal/storage backend-kind registry: StorageType, Known, Granularity, Reference +internal/project project domain layer: the .katalyst/ loader (loader.go: schemas + bases, which embed their collections), the whole workspace, selectors, item enumeration +internal/storage backend-kind registry: BaseType, Known, Scope, Reference internal/storage/collection the read stack: CollectionDefinition + the thin Item internal/storage/collection/listing item list filter/grep/sort/skip/limit pipeline internal/storage/collection/predicate metadata predicate grammar (item list --filter, collection variants) @@ -64,8 +64,8 @@ reconstruction), implemented per backend under `storage/collection/` (filesystem today). Don't inline filesystem assumptions (globbing, stem-as-id, path joins) elsewhere, a second backend (SQLite) attaches by implementing that interface. The `internal/project` loader (`loader.go`) owns the `.katalyst/` -*vocabulary*: it reads the workspace, resolves schemas, and assembles storage -instances. Each object type owns the parse of its own config — the storage +*vocabulary*: it reads the workspace, resolves schemas, and assembles bases. +Each object type owns the parse of its own config — the storage registry validates a declared `type` (`storage.Known`), and a collection parses its own block, including variant predicates, in `storage/collection` (which imports the sibling `predicate` grammar intra-subtree). The loader depends on @@ -159,7 +159,7 @@ you add a fixture. descriptor is malformed, so a new check type ships with a complete descriptor. The `json:` tags on `Descriptor`/`Field` are the published wire contract for `katalyst check-types list --json`; keep them stable. - A check type's **family** groups it by source-data kind, and is orthogonal to - its granularity: a collection-scoped check is filed by the data it reads + its scope: a collection-scoped check is filed by the data it reads (`unique_field` → `structuredObject`, `unique_filename` → `fileSystem`). The `kind` id is the wire contract and never changes, even when the family does. - A **CheckLibrary** (`internal/checks`, `CheckLibrary`/`SchemaLibrary`) is the diff --git a/README.md b/README.md index 45fcf673..fd4259b0 100644 --- a/README.md +++ b/README.md @@ -49,7 +49,7 @@ As your content evolves, Katalyst gives you tools to navigate change. - *Add or change checks* - *Change the structure of your content* -- *Change your storage layer* +- *Change your base* ## Design principles diff --git a/cmd/AGENTS.md b/cmd/AGENTS.md index 1df0b6fd..83aff261 100644 --- a/cmd/AGENTS.md +++ b/cmd/AGENTS.md @@ -21,7 +21,7 @@ When adding a top-level command, decide which family it joins: selectors (`check`, `fix`). `init` and `inspect` are verbs too, they take flags or a path rather than a selector. `inspect` infers its inspector **layer** from the single argument: a configured collection name runs the - collection layer; anything else is a filesystem path for the raw-source layer + collection layer; anything else is a filesystem path for the raw base layer (with no project, always raw). Layer selection is by argument, deliberately not a flag, to keep the onboarding case (`inspect ./wiki`) flag-free. - **Resource noun:** `katalyst `, a group whose diff --git a/cmd/check.go b/cmd/check.go index 6608879d..38261990 100644 --- a/cmd/check.go +++ b/cmd/check.go @@ -25,9 +25,9 @@ func newCheckCmd() *cobra.Command { Use: "check [selector ...]", Short: "Run configured checks against the selected items", Long: `check parses each selected item's frontmatter (YAML, TOML, or JSON) -and runs the checks configured for its collection under .katalyst/storage/. +and runs the checks configured for its collection under .katalyst/bases/. -Selectors (see docs/content/deep-dives/domain-model.md): +Selectors (see docs/content/deep-dives/domain-model/_index.md): (none) the whole project (every collection) one collection (all its items) diff --git a/cmd/check_test.go b/cmd/check_test.go index 59d23f01..f0d14fff 100644 --- a/cmd/check_test.go +++ b/cmd/check_test.go @@ -14,9 +14,9 @@ func setupNotesRepo(t *testing.T, notesCollection string) string { t.Helper() dir := t.TempDir() writeProject(t, dir, map[string]string{ - "config.yaml": schemaFormatJSON, - "schemas/book.json": bookSchemaFixture, - "storage/local.yaml": storageLocal(map[string]string{"notes": notesCollection}), + "config.yaml": schemaFormatJSON, + "schemas/book.json": bookSchemaFixture, + "bases/local.yaml": baseLocal(map[string]string{"notes": notesCollection}), }) chdir(t, dir) return dir @@ -167,7 +167,7 @@ func setupVariantRepo(t *testing.T, pagesBody string) string { "schemas/page.yaml": "type: object\nrequired: [title]\nproperties:\n title: {type: string}\n", "schemas/section.yaml": "type: object\n", "schemas/content.yaml": "type: object\nrequired: [weight]\nproperties:\n weight: {type: integer}\n", - "storage/local.yaml": storageLocal(map[string]string{"pages": pagesBody}), + "bases/local.yaml": baseLocal(map[string]string{"pages": pagesBody}), }) chdir(t, dir) return dir @@ -335,7 +335,7 @@ func TestCheck_inlineSchemaKeyTakesPrecedence(t *testing.T) { "config.yaml": schemaFormatJSON, "schemas/book.json": bookSchemaFixture, "schemas/strict-book.json": strictBookSchemaFixture, - "storage/local.yaml": storageLocal(map[string]string{"notes": "path: notes\nschema: book\n"}), + "bases/local.yaml": baseLocal(map[string]string{"notes": "path: notes\nschema: book\n"}), }) chdir(t, dir) @@ -356,7 +356,7 @@ func TestCheck_inlineSchemaKeyTakesPrecedence(t *testing.T) { func TestCheck_markdownAndFilesystemChecks(t *testing.T) { dir := t.TempDir() writeProject(t, dir, map[string]string{ - "storage/local.yaml": storageLocal(map[string]string{"notes": "path: notes\nchecks:\n - kind: markdown_title_matches_h1\n field: title\n"}), + "bases/local.yaml": baseLocal(map[string]string{"notes": "path: notes\nchecks:\n - kind: markdown_title_matches_h1\n field: title\n"}), }) chdir(t, dir) mustWrite(t, filepath.Join(dir, "notes/dune.md"), "---\ntitle: Dune\n---\n# Children of Dune\n") @@ -376,7 +376,7 @@ func TestCheck_markdownAndFilesystemChecks(t *testing.T) { func TestCheck_collectionScoped_rescanFullCollectionForSingleItemSelector(t *testing.T) { dir := t.TempDir() writeProject(t, dir, map[string]string{ - "storage/local.yaml": storageLocal(map[string]string{"notes": "path: notes\nchecks:\n - kind: filesystem_unique_field\n field: slug\n"}), + "bases/local.yaml": baseLocal(map[string]string{"notes": "path: notes\nchecks:\n - kind: filesystem_unique_field\n field: slug\n"}), }) chdir(t, dir) mustWrite(t, filepath.Join(dir, "notes/a.md"), "---\nslug: dune\n---\n# A\n") @@ -395,7 +395,7 @@ func TestCheck_collectionScoped_rescanFullCollectionForSingleItemSelector(t *tes func TestCheck_writingTells_warnButPass(t *testing.T) { dir := t.TempDir() writeProject(t, dir, map[string]string{ - "storage/local.yaml": storageLocal(map[string]string{"notes": "path: notes\nchecks:\n - kind: markdown_writing_tells\n"}), + "bases/local.yaml": baseLocal(map[string]string{"notes": "path: notes\nchecks:\n - kind: markdown_writing_tells\n"}), }) chdir(t, dir) mustWrite(t, filepath.Join(dir, "notes/x.md"), diff --git a/cmd/check_types_test.go b/cmd/check_types_test.go index bacdaaa4..a78dc45e 100644 --- a/cmd/check_types_test.go +++ b/cmd/check_types_test.go @@ -25,10 +25,10 @@ func TestCheckTypes_listsEveryTypeGroupedByFamily(t *testing.T) { // Family titles appear in Families() order. last := -1 for _, title := range []string{ - "Structured object check types", - "Markdown body text check types", - "File system check types", - "Plain text check types", + "Structured object", + "Markdown body text", + "File system", + "Plain text", } { i := strings.Index(stdout, title) if i < 0 { diff --git a/cmd/collection.go b/cmd/collection.go index bcc27208..7a981e7b 100644 --- a/cmd/collection.go +++ b/cmd/collection.go @@ -10,7 +10,7 @@ import ( func newCollectionCmd() *cobra.Command { c := &cobra.Command{ Use: "collection", - Short: "Inspect collections declared by storage instances under .katalyst/storage/", + Short: "Inspect collections declared by bases under .katalyst/bases/", } c.AddCommand(newCollectionListCmd(), newCollectionGetCmd()) return c diff --git a/cmd/fix.go b/cmd/fix.go index 0d0843df..f7d9fc40 100644 --- a/cmd/fix.go +++ b/cmd/fix.go @@ -23,7 +23,7 @@ top-level keys sorted alphabetically, yaml.v3 default block style, and exactly one trailing newline. The body is preserved verbatim. fix never invents semantic values: it will not inject placeholders for -missing required keys. See docs/content/deep-dives/formatting.md for why. +missing required keys. See docs/content/deep-dives/domain-model/fix.md for why. Selectors follow the same grammar as 'check'. With no selector, every item in the project is considered. diff --git a/cmd/fix_test.go b/cmd/fix_test.go index b60c795c..23725f66 100644 --- a/cmd/fix_test.go +++ b/cmd/fix_test.go @@ -17,7 +17,7 @@ func setupFixRepo(t *testing.T) string { t.Helper() dir := t.TempDir() writeProject(t, dir, map[string]string{ - "storage/local.yaml": storageLocal(map[string]string{"notes": fixNotesConfig}), + "bases/local.yaml": baseLocal(map[string]string{"notes": fixNotesConfig}), }) chdir(t, dir) return dir @@ -27,7 +27,7 @@ func setupFixRepoWith(t *testing.T, notesConfig string) string { t.Helper() dir := t.TempDir() writeProject(t, dir, map[string]string{ - "storage/local.yaml": storageLocal(map[string]string{"notes": notesConfig}), + "bases/local.yaml": baseLocal(map[string]string{"notes": notesConfig}), }) chdir(t, dir) return dir diff --git a/cmd/gendocs/main.go b/cmd/gendocs/main.go index 71713599..f963f369 100644 --- a/cmd/gendocs/main.go +++ b/cmd/gendocs/main.go @@ -232,7 +232,7 @@ func inspectorsIndex(layers []inspect.Layer, byLayer map[string][]inspect.Descri fmt.Fprint(&b, "distributions, never recommendations. They are the descriptive dual of ") fmt.Fprint(&b, "[check types]({{< relref \"../check-types/_index.md\" >}}) and drive the ") fmt.Fprint(&b, "[`inspect`]({{< relref \"../cli.md\" >}}) command. They come in two layers: ") - fmt.Fprint(&b, "raw-source inspectors profile a store before configuration, collection ") + fmt.Fprint(&b, "raw base inspectors profile a base before configuration, collection ") fmt.Fprint(&b, "inspectors profile a configured collection. These pages are generated from the ") fmt.Fprint(&b, "inspector registry, so they always match the shipped binary.\n") for _, layer := range layers { diff --git a/cmd/helpers_test.go b/cmd/helpers_test.go index 54f78710..a73071c0 100644 --- a/cmd/helpers_test.go +++ b/cmd/helpers_test.go @@ -61,20 +61,20 @@ func chdir(t *testing.T, dir string) { const schemaFormatJSON = "schemas:\n format: json\n" // writeProject scaffolds a .katalyst/ tree. Keys are paths relative to the -// .katalyst/ directory (e.g. "schemas/book.json", "storage/local.yaml", +// .katalyst/ directory (e.g. "schemas/book.json", "bases/local.yaml", // "config.yaml"); values are file contents. func writeProject(t *testing.T, dir string, files map[string]string) { t.Helper() projecttest.WriteProject(t, dir, files) } -// storageLocal builds a .katalyst/storage/local.yaml body: a filesystem -// instance rooted at the project, declaring the given collections. Each value +// baseLocal builds a .katalyst/bases/local.yaml body: a filesystem base rooted +// at the project, declaring the given collections. Each value // is the collection's YAML body, re-indented under its name. Collections now -// live inside their storage instance, so tests scaffold them this way instead +// live inside their base, so tests scaffold them this way instead // of one file per collection. -func storageLocal(collections map[string]string) string { - return projecttest.LocalStorage(collections) +func baseLocal(collections map[string]string) string { + return projecttest.LocalBase(collections) } // writeConfigDir writes the two-schema book-and-person project (book and @@ -87,7 +87,7 @@ func writeConfigDir(t *testing.T) string { "config.yaml": schemaFormatJSON, "schemas/book.json": bookSchemaFixture, "schemas/person.json": personSchemaFixture, - "storage/local.yaml": storageLocal(map[string]string{ + "bases/local.yaml": baseLocal(map[string]string{ "books": "path: notes/books\nschema: book\n", "people": "path: notes/people\nschema: person\n", }), diff --git a/cmd/init.go b/cmd/init.go index 9c0c9e95..630c27f1 100644 --- a/cmd/init.go +++ b/cmd/init.go @@ -15,24 +15,24 @@ import ( // the available knobs. const scaffoldConfig = `# katalyst project configuration. # -# Schemas live in .katalyst/schemas/.yaml. Storage instances live in -# .katalyst/storage/.yaml, and each instance declares the collections it -# maps. The settings below are optional and shown at their defaults; uncomment -# to change them. +# Schemas live in .katalyst/schemas/.yaml. Bases live in +# .katalyst/bases/.yaml, and each base declares the collections it maps. +# The settings below are optional and shown at their defaults; uncomment to +# change them. # # schemas: # discovery: convention # convention | explicit # format: yaml # yaml | json | both -# storage: +# bases: # discovery: convention # format: yaml ` -// scaffoldLocalStorage is the default storage instance written by init: the -// local filesystem rooted at the project. There is no implicit instance, -// this file is what makes the default explicit. Collections are declared -// inline here (or split into .katalyst/storage/local/.yaml). -const scaffoldLocalStorage = `# The default storage instance: the local filesystem, rooted at the project. +// scaffoldLocalBase is the default base written by init: the local filesystem +// rooted at the project. There is no implicit base, this file is what makes the +// default explicit. Collections are declared inline here (or split into +// .katalyst/bases/local/.yaml). +const scaffoldLocalBase = `# The default base: the local filesystem, rooted at the project. # Declare collections under "collections:", e.g. # # collections: @@ -68,7 +68,7 @@ func newInitCmd() *cobra.Command { return usageErr(fmt.Sprintf("%s already exists; refusing to overwrite", katalystDir)) } - for _, sub := range []string{"schemas", "storage"} { + for _, sub := range []string{"schemas", "bases"} { rel := filepath.Join(project.Dir, sub) if err := os.MkdirAll(filepath.Join(target, rel), 0o755); err != nil { return err @@ -76,13 +76,13 @@ func newInitCmd() *cobra.Command { fmt.Fprintf(cmd.OutOrStdout(), "created %s/\n", rel) } - // Write the default storage instance explicitly; katalyst never - // synthesizes one at runtime. - storageRel := filepath.Join(project.Dir, "storage", "local.yaml") - if err := os.WriteFile(filepath.Join(target, storageRel), []byte(scaffoldLocalStorage), 0o644); err != nil { + // Write the default base explicitly; katalyst never synthesizes one + // at runtime. + baseRel := filepath.Join(project.Dir, "bases", "local.yaml") + if err := os.WriteFile(filepath.Join(target, baseRel), []byte(scaffoldLocalBase), 0o644); err != nil { return err } - fmt.Fprintf(cmd.OutOrStdout(), "created %s\n", storageRel) + fmt.Fprintf(cmd.OutOrStdout(), "created %s\n", baseRel) cfgRel := filepath.Join(project.Dir, "config.yaml") if err := os.WriteFile(filepath.Join(target, cfgRel), []byte(scaffoldConfig), 0o644); err != nil { diff --git a/cmd/init_test.go b/cmd/init_test.go index e8cd1279..b5c445dd 100644 --- a/cmd/init_test.go +++ b/cmd/init_test.go @@ -16,8 +16,8 @@ func TestInit_preparesKatalystDir(t *testing.T) { for _, want := range []string{ ".katalyst", ".katalyst/schemas", - ".katalyst/storage", - ".katalyst/storage/local.yaml", + ".katalyst/bases", + ".katalyst/bases/local.yaml", ".katalyst/config.yaml", } { if _, err := os.Stat(filepath.Join(dir, want)); err != nil { @@ -39,7 +39,7 @@ func TestInit_writesNoExampleContent(t *testing.T) { "schemas", "notes", ".katalyst/schemas/book.yaml", - ".katalyst/storage/local/notes.yaml", + ".katalyst/bases/local/notes.yaml", } { if _, err := os.Stat(filepath.Join(dir, unwanted)); err == nil { t.Errorf("did not expect %s to exist", unwanted) diff --git a/cmd/inspect.go b/cmd/inspect.go index a8eb68a8..d65157fa 100644 --- a/cmd/inspect.go +++ b/cmd/inspect.go @@ -28,7 +28,7 @@ finds as evidence: counts and distributions, never recommendations. The layer is inferred from the argument. Inside a katalyst project, a configured collection name (e.g. notes) runs the collection inspectors over that collection's items. Otherwise the argument is a filesystem path and the -raw-source inspectors profile the tree (the onboarding case: "what's here?"). +raw base inspectors profile the tree (the onboarding case: "what's here?"). Inspectors describe; they never recommend. inspect writes no schema and mutates nothing. Output is Markdown by default; --json emits the same evidence as JSON.`, @@ -37,7 +37,7 @@ nothing. Output is Markdown by default; --json emits the same evidence as JSON.` params := inspect.Params{} if selectExpr != "" { if len(inspectors) != 1 || inspectors[0] != "file_content_shape" { - return usageErr("--select requires exactly one source inspector: --inspector file_content_shape") + return usageErr("--select requires exactly one raw base inspector: --inspector file_content_shape") } params = params.WithSelection(inspect.ParseSelection(selectExpr)) } @@ -82,7 +82,7 @@ nothing. Output is Markdown by default; --json emits the same evidence as JSON.` // runInspect selects the layer from the argument and runs its inspectors. A // configured collection name runs the collection layer; anything else is a -// filesystem path for the raw-source layer. +// filesystem path for the raw base layer. func runInspect(arg string, names []string, params inspect.Params) ([]inspect.Evidence, error) { if proj, c, ok := resolveCollection(arg); ok { return runCollectionLayer(proj, c, names, params) diff --git a/cmd/inspect_test.go b/cmd/inspect_test.go index 772262ea..094f91a1 100644 --- a/cmd/inspect_test.go +++ b/cmd/inspect_test.go @@ -35,7 +35,7 @@ func TestInspect_rawPathRunsSourceLayer(t *testing.T) { func TestInspect_collectionLayerWhenConfigured(t *testing.T) { dir := t.TempDir() - writeFile(t, dir, ".katalyst/storage/local.yaml", `type: filesystem + writeFile(t, dir, ".katalyst/bases/local.yaml", `type: filesystem root: . collections: notes: diff --git a/cmd/inspectors.go b/cmd/inspectors.go index 71bcedea..974336da 100644 --- a/cmd/inspectors.go +++ b/cmd/inspectors.go @@ -21,7 +21,7 @@ func newInspectorsCmd() *cobra.Command { Short: "Inspect the inspectors katalyst can run, grouped by layer", Long: `inspectors is a read-only view of katalyst's inspector registry, the same catalog cmd/gendocs renders and that the inspect command runs. List every -inspector grouped by layer (raw-source, collection), or show one inspector's +inspector grouped by layer (raw base, collection), or show one inspector's docs-style readout. It reads no project, so it runs in any directory.`, } c.AddCommand(newInspectorsListCmd(), newInspectorsShowCmd()) diff --git a/cmd/inspectors_test.go b/cmd/inspectors_test.go index 9e6e34c1..9ac09c26 100644 --- a/cmd/inspectors_test.go +++ b/cmd/inspectors_test.go @@ -21,7 +21,7 @@ func TestInspectors_listsEveryInspectorGroupedByLayer(t *testing.T) { } last := -1 - for _, title := range []string{"Raw-source inspectors", "Collection inspectors"} { + for _, title := range []string{"Raw base inspectors", "Collection inspectors"} { i := strings.Index(stdout, title) if i < 0 { t.Errorf("expected layer title %q in output", title) diff --git a/cmd/item_test.go b/cmd/item_test.go index 4af2d7f1..daac1797 100644 --- a/cmd/item_test.go +++ b/cmd/item_test.go @@ -17,7 +17,7 @@ func setupItemRepo(t *testing.T) string { "config.yaml": schemaFormatJSON, "schemas/book.json": bookSchemaFixture, "schemas/strict-book.json": strictBookSchemaFixture, - "storage/local.yaml": storageLocal(map[string]string{"notes": objectNotesConfig}), + "bases/local.yaml": baseLocal(map[string]string{"notes": objectNotesConfig}), }) chdir(t, dir) return dir diff --git a/cmd/testdata/snapshots/check-types/list-family-markdown.txt b/cmd/testdata/snapshots/check-types/list-family-markdown.txt index 5e6a6dd1..f676b5c0 100644 --- a/cmd/testdata/snapshots/check-types/list-family-markdown.txt +++ b/cmd/testdata/snapshots/check-types/list-family-markdown.txt @@ -1,5 +1,5 @@ -Markdown body text check types (7) ----------------------------------- +Markdown body text (7) +---------------------- - markdown_code_fence_language_required purpose: Require that opening fenced code blocks include a language tag. required: - diff --git a/cmd/testdata/snapshots/check-types/list.txt b/cmd/testdata/snapshots/check-types/list.txt index f9e275a8..ea4b3436 100644 --- a/cmd/testdata/snapshots/check-types/list.txt +++ b/cmd/testdata/snapshots/check-types/list.txt @@ -1,5 +1,5 @@ -Structured object check types (8) ---------------------------------- +Structured object (8) +--------------------- - object_field_enum purpose: Require that a field is one of a fixed set of values. required: field, values @@ -33,8 +33,8 @@ Structured object check types (8) required: schema optional: - -Markdown body text check types (7) ----------------------------------- +Markdown body text (7) +---------------------- - markdown_code_fence_language_required purpose: Require that opening fenced code blocks include a language tag. required: - @@ -64,8 +64,8 @@ Markdown body text check types (7) required: - optional: - -File system check types (13) ----------------------------- +File system (13) +---------------- - filesystem_extension_in purpose: Allow only specific file extensions. required: values @@ -119,8 +119,8 @@ File system check types (13) required: - optional: - -Plain text check types (4) --------------------------- +Plain text (4) +-------------- - text_denylist purpose: Forbid any of a list of literal substrings in the body text. required: values diff --git a/cmd/testdata/snapshots/check-types/show-markdown_single_h1.txt b/cmd/testdata/snapshots/check-types/show-markdown_single_h1.txt index 86e8f8b8..ff74a52d 100644 --- a/cmd/testdata/snapshots/check-types/show-markdown_single_h1.txt +++ b/cmd/testdata/snapshots/check-types/show-markdown_single_h1.txt @@ -1,5 +1,5 @@ -Markdown body text check types › Single H1 --------------------------------------------- +Markdown body text › Single H1 +-------------------------------- - kind: markdown_single_h1 - family: markdownBodyText - scope: item @@ -22,8 +22,8 @@ Example checks: - kind: markdown_single_h1 -Other markdown body text check types (6) ----------------------------------------- +Other markdown body text (6) +---------------------------- - markdown_code_fence_language_required - markdown_no_heading_level_jumps - markdown_required_section diff --git a/cmd/testdata/snapshots/check-types/show-object_field_enum.txt b/cmd/testdata/snapshots/check-types/show-object_field_enum.txt index bc4823d2..ef4263db 100644 --- a/cmd/testdata/snapshots/check-types/show-object_field_enum.txt +++ b/cmd/testdata/snapshots/check-types/show-object_field_enum.txt @@ -1,5 +1,5 @@ -Structured object check types › Field enum --------------------------------------------- +Structured object › Field enum +-------------------------------- - kind: object_field_enum - family: structuredObject - scope: item @@ -31,8 +31,8 @@ Example field: status values: [draft, published, archived] -Other structured object check types (7) ---------------------------------------- +Other structured object (7) +--------------------------- - object_field_type - object_number_range - object_required_field diff --git a/cmd/testdata/snapshots/check-types/show-object_required_field.txt b/cmd/testdata/snapshots/check-types/show-object_required_field.txt index 7b042da4..db0f40c2 100644 --- a/cmd/testdata/snapshots/check-types/show-object_required_field.txt +++ b/cmd/testdata/snapshots/check-types/show-object_required_field.txt @@ -1,5 +1,5 @@ -Structured object check types › Required field ------------------------------------------------- +Structured object › Required field +------------------------------------ - kind: object_required_field - family: structuredObject - scope: item @@ -26,8 +26,8 @@ Example - kind: object_required_field field: year -Other structured object check types (7) ---------------------------------------- +Other structured object (7) +--------------------------- - object_field_enum - object_field_type - object_number_range diff --git a/cmd/testdata/snapshots/help/check.txt b/cmd/testdata/snapshots/help/check.txt index 203ddce1..c106dd9b 100644 --- a/cmd/testdata/snapshots/help/check.txt +++ b/cmd/testdata/snapshots/help/check.txt @@ -1,7 +1,7 @@ check parses each selected item's frontmatter (YAML, TOML, or JSON) -and runs the checks configured for its collection under .katalyst/storage/. +and runs the checks configured for its collection under .katalyst/bases/. -Selectors (see docs/content/deep-dives/domain-model.md): +Selectors (see docs/content/deep-dives/domain-model/_index.md): (none) the whole project (every collection) one collection (all its items) diff --git a/cmd/testdata/snapshots/help/fix.txt b/cmd/testdata/snapshots/help/fix.txt index 29c82677..163edead 100644 --- a/cmd/testdata/snapshots/help/fix.txt +++ b/cmd/testdata/snapshots/help/fix.txt @@ -3,7 +3,7 @@ top-level keys sorted alphabetically, yaml.v3 default block style, and exactly one trailing newline. The body is preserved verbatim. fix never invents semantic values: it will not inject placeholders for -missing required keys. See docs/content/deep-dives/formatting.md for why. +missing required keys. See docs/content/deep-dives/domain-model/fix.md for why. Selectors follow the same grammar as 'check'. With no selector, every item in the project is considered. diff --git a/cmd/testdata/snapshots/help/inspect.txt b/cmd/testdata/snapshots/help/inspect.txt index 0b5f7b2e..d3f29929 100644 --- a/cmd/testdata/snapshots/help/inspect.txt +++ b/cmd/testdata/snapshots/help/inspect.txt @@ -4,7 +4,7 @@ finds as evidence: counts and distributions, never recommendations. The layer is inferred from the argument. Inside a katalyst project, a configured collection name (e.g. notes) runs the collection inspectors over that collection's items. Otherwise the argument is a filesystem path and the -raw-source inspectors profile the tree (the onboarding case: "what's here?"). +raw base inspectors profile the tree (the onboarding case: "what's here?"). Inspectors describe; they never recommend. inspect writes no schema and mutates nothing. Output is Markdown by default; --json emits the same evidence as JSON. diff --git a/cmd/testdata/snapshots/help/inspectors.txt b/cmd/testdata/snapshots/help/inspectors.txt index bac45b7b..5aa7c3ce 100644 --- a/cmd/testdata/snapshots/help/inspectors.txt +++ b/cmd/testdata/snapshots/help/inspectors.txt @@ -1,6 +1,6 @@ inspectors is a read-only view of katalyst's inspector registry, the same catalog cmd/gendocs renders and that the inspect command runs. List every -inspector grouped by layer (raw-source, collection), or show one inspector's +inspector grouped by layer (raw base, collection), or show one inspector's docs-style readout. It reads no project, so it runs in any directory. Usage: diff --git a/cmd/testdata/snapshots/inspectors/list.txt b/cmd/testdata/snapshots/inspectors/list.txt index e96a3c36..4867a21d 100644 --- a/cmd/testdata/snapshots/inspectors/list.txt +++ b/cmd/testdata/snapshots/inspectors/list.txt @@ -1,5 +1,5 @@ -Raw-source inspectors (2) -------------------------- +Raw base inspectors (2) +----------------------- - file_tree Map files, directories, extensions, regions, and filename conventions, opening no files. - file_content_shape diff --git a/cmd/testdata/snapshots/inspectors/show-file_content_shape.txt b/cmd/testdata/snapshots/inspectors/show-file_content_shape.txt index da333b43..8c58c31e 100644 --- a/cmd/testdata/snapshots/inspectors/show-file_content_shape.txt +++ b/cmd/testdata/snapshots/inspectors/show-file_content_shape.txt @@ -1,5 +1,5 @@ -Raw-source inspectors › File content shape --------------------------------------------- +Raw base inspectors › File content shape +------------------------------------------ - inspector: file_content_shape - layer: source - family: structural @@ -7,8 +7,8 @@ Raw-source inspectors › File content shape Layer context ------------- -Raw-source inspectors profile a backend store directly, before any collection configuration: what files are present, how they parse, and how they are named. +Raw base inspectors profile a base directly, before any collection configuration: what files are present, how they parse, and how they are named. -Other raw-source inspectors (1) -------------------------------- +Other raw base inspectors (1) +----------------------------- - file_tree diff --git a/docs/assets/_custom.scss b/docs/assets/_custom.scss index e0609105..56f56d0d 100644 --- a/docs/assets/_custom.scss +++ b/docs/assets/_custom.scss @@ -23,6 +23,14 @@ margin-inline-start: 1.25rem; } +.markdown.book-article img.diagram--domain-model { + display: block; + margin: 1rem auto; + width: auto; + max-width: 600px; + height: auto; +} + // Global status note rendered under the TOC. .book-toc :is(.book-toc-note, .book-toc-warning) { margin-top: 0.9rem; diff --git a/docs/content/contributing/how-we-document.md b/docs/content/contributing/how-we-document.md index 5a3b7883..8b7d4f6d 100644 --- a/docs/content/contributing/how-we-document.md +++ b/docs/content/contributing/how-we-document.md @@ -34,8 +34,8 @@ The durable home for everything a user needs, organized by - **`reference/`:** information-oriented lookup: configuration, the generated check-type reference, the glossary, the command surface. - **`deep-dives/`:** understanding-oriented "why" (the Diátaxis *explanation* - quadrant): the vision and scope, the core concepts, the storage layer, - progressive operations, and **design rationale at the behavioral altitude** - + quadrant): the vision and scope, the domain model, bases, progressive + operations, and **design rationale at the behavioral altitude** - any *why* a user can observe, whatever subsystem it touches. A short **Why Katalyst?** orientation page sits at the top level. The narrower *why* that only matters once you are reading a package's code lives with that code (see @@ -95,13 +95,14 @@ feature. They are tests that double as documentation; see ## Templates -New reference and explanation pages start from a template under -`templates/`. Each carries the Diátaxis "this page IS X, is NOT Y" -guardrail. The templates are marked `draft = true` so the public build +New reference, explanation, and domain-model deep-dive pages start from a +template under `templates/`. Each carries the Diátaxis "this page IS X, is NOT +Y" guardrail. The templates are marked `draft = true` so the public build excludes them; they are in-repo for contributors only. - [Reference template](templates/reference.md) - [Explanation template](templates/explanation.md) +- [Domain model deep-dive template](templates/domain-model-deep-dive.md) Tutorial and how-to templates are derived from the first real page of each type rather than guessed up front. diff --git a/docs/content/contributing/how-we-plan.md b/docs/content/contributing/how-we-plan.md index 1bf0e03a..5f219a9f 100644 --- a/docs/content/contributing/how-we-plan.md +++ b/docs/content/contributing/how-we-plan.md @@ -76,7 +76,7 @@ document]({{< relref "how-we-document.md" >}}) for what belongs where: - **`docs/reference/glossary.md`:** new vocabulary. - **`README.md`:** pointer/overview updates. -Evergreen deep-dive docs (the storage layer, progressive operations) and the +Evergreen deep-dive docs (bases, progressive operations) and the per-package `AGENTS.md` files are *not* specs and don't get retired: they're updated in place. diff --git a/docs/content/contributing/templates/_index.md b/docs/content/contributing/templates/_index.md index 5427dbb3..973906a2 100644 --- a/docs/content/contributing/templates/_index.md +++ b/docs/content/contributing/templates/_index.md @@ -14,5 +14,6 @@ matching section and fill it in. Each template names what its page **is** and - [Reference page template](reference.md) - [Explanation page template](explanation.md) +- [Domain model deep-dive template](domain-model-deep-dive.md) - [Tutorial page template](tutorial.md) - [How-to page template](how-to.md) diff --git a/docs/content/contributing/templates/domain-model-deep-dive.md b/docs/content/contributing/templates/domain-model-deep-dive.md new file mode 100644 index 00000000..69300668 --- /dev/null +++ b/docs/content/contributing/templates/domain-model-deep-dive.md @@ -0,0 +1,65 @@ ++++ +title = "Domain model deep-dive template" +weight = 25 +draft = true ++++ + + + +# + +One or two short paragraphs that define the concept, name what owns it, and +explain what it connects to. Link to the reference page when the reader needs +precise syntax rather than design rationale. + +## Terms + +| Term | Meaning | +|---|---| +| **** | Definition in user-facing vocabulary. Mention code identifiers only when they clarify the seam. | + +## Model + +Explain how the concept works structurally. Prefer one coherent model section +over several scattered "why" sections. Use diagrams or tables when they make +relationships easier to scan. + +## Lifecycle + +Use this section only when the page describes a command or process. Describe +the ordered flow from input to output, including where errors accumulate and +where state changes happen. + +## Design rationale + +**Decision name.** Explain why the system works this way, including the +trade-off. When this choice replaced an earlier approach, record that history +here rather than in a separate decision log. + +## Invariants + +1. **Invariant name.** The rule that must stay true. +2. **Invariant name.** The rule that must stay true. + +## Extension points + +Use this section only when there is a code seam, planned backend, or future +expansion path worth naming. + +## See also + +- The reference page for precise syntax. +- Related domain-model pages. +- `go doc ./internal/` for the code-level contract. diff --git a/docs/content/deep-dives/_index.md b/docs/content/deep-dives/_index.md index 51b20406..cac0fce7 100644 --- a/docs/content/deep-dives/_index.md +++ b/docs/content/deep-dives/_index.md @@ -6,15 +6,15 @@ bookCollapseSection = true # Deep dives -Understanding-oriented discussion of the *why* behind Katalyst, the -[vision and scope]({{< relref "vision.md" >}}), the [core -concepts]({{< relref "core-concepts.md" >}}) the tool is built on, the -[domain model]({{< relref "domain-model.md" >}}) that instantiates them in -katalyst, and the deeper design discussions that no single page or package -owns: how [checks work]({{< relref "checks.md" >}}) and the libraries that run them, -how the [storage layer]({{< relref "storage.md" >}}) maps stores onto the model, -and how operations grow richer as a backend's capabilities increase. For the -short version, start with [Welcome]({{< relref "../welcome.md" >}}). +Understanding-oriented discussion of the *why* behind Katalyst: the +[vision and scope]({{< relref "vision.md" >}}), the +[domain model]({{< relref "domain-model/_index.md" >}}) the tool is built on, +and the deeper design discussions that no single page or package owns: how +[checks work]({{< relref "domain-model/checks.md" >}}) and the libraries that +run them, how [bases]({{< relref "domain-model/base.md" >}}) map backend +sources onto the model, and how operations grow richer as a backend's +capabilities increase. For the short version, start with +[Welcome]({{< relref "../welcome.md" >}}). These pages carry the **behavioral *why*** - any rationale a user can observe - and each subsystem's **architecture**: how it is built, its entities, and the diff --git a/docs/content/deep-dives/collections.md b/docs/content/deep-dives/collections.md deleted file mode 100644 index 8b055694..00000000 --- a/docs/content/deep-dives/collections.md +++ /dev/null @@ -1,194 +0,0 @@ -+++ -title = "Collections" -weight = 42 -+++ - -# Collections - -The `internal/project` loader (`loader.go`) is the orchestration hub: it loads a -project's `.katalyst/` directory, resolves named schemas, and assembles storage -instances and their collections (each object type parses its own config — the -storage registry validates a declared `type`, and a collection parses its own -block in `storage/collection`). It decides which schema applies to a given item, -and the `check` lifecycle is driven from here. -This page is the model and the *why*; for the key-by-key surface see the -[configuration reference]({{< relref "../reference/configuration.md" >}}). - -## The `.katalyst/` directory - -Configuration lives in a `.katalyst/` directory, discovered by walking **up** -from the working directory to the nearest ancestor that contains one. That -ancestor becomes the repo root for all path resolution. - -The directory holds an optional `config.yaml`, one schema file per definition -under `schemas/`, and one storage-instance file per definition under `storage/`. -A directory (rather than one big file) keeps each schema and instance in its own -reviewable file and lets the name fall out of the filename by convention. A -nearest-ancestor lookup mirrors `.git`, `.editorconfig`, and `go.mod`: familiar -and predictable. Discovery resolves symlinks on both the root and the input -path, because on macOS `$TMPDIR` lives behind `/var` to `/private/var` and -relative-path resolution would otherwise produce garbage. - -`config.yaml` is YAML; schema and storage files default to YAML/JSON and the -accepted format is set per kind there. Default discovery is **convention** (one -file per definition); a kind can be switched to **explicit** to list its -definitions inline in `config.yaml` instead. - -Collections are declared *inside* a [storage instance]({{< relref "storage.md" >}}), -which owns the backend-to-collection mapping. This page covers the collection -model and schema resolution; the storage layer covers how an instance maps a -backend onto those collections. - -## The model - -- **Collection** - a named group of items backed by a directory; the unit you - select on the command line and the unit that owns a set of checks. `path` - defaults to the collection name; `pattern` defaults to `*.md`. Collection - names are unique project-wide, since a selector carries no instance qualifier. - - ```yaml - # inside .katalyst/storage/local.yaml - collections: - books: - path: notes/books # directory, relative to the repo root - pattern: "*.md" # filename glob; default "*.md" - schema: book # shorthand for a single leading object check - checks: # any additional checks - - kind: markdown_title_matches_h1 - ``` - -- **Item** - a single member of a collection: one file matching the - collection's `pattern`. Its **id** is the filename stem (`notes/books/dune.md` - gives `dune`). -- **Selector** - how commands (`check`, `fix`, the `item` subcommands) name what - to operate on, broad to narrow: *(none)* is the whole project, `` - is one collection, `/` is a single item. -- **Schema** - a JSON Schema (draft 2020-12 by default) describing the legal - shape of an item's parsed `Meta`. A schema has two identities: a **path** on - disk and a **name** (its filename stem under `.katalyst/schemas/`). The name - is the stable public handle; paths can change. `--schema ` bypasses the - name layer entirely. -- **Schema directive** (`schema:` in frontmatter) - a per-document opt-in to a - specific schema. It is **metadata about katalyst, not user data**: the - resolver reads it to choose a schema, then strips it from `Meta` before - validating, so a schema with `additionalProperties: false` is not tripped by - katalyst's own key. - -The `config.Config` loaded from disk is the single source of truth for "what -schemas exist and what each collection checks." It is validated at load: every -collection's object schema must reference a known schema, and a collection must -configure at least one check (via the `schema:` shorthand or an explicit -`checks:` list). - -## Why schema resolution has three tiers - -When `check` validates an item against an object schema, it resolves which -schema, highest precedence first: - -| # | Source | When it wins | -|---|-------------------------------|--------------| -| 1 | `--schema ` flag | Always, for every item in the invocation | -| 2 | Inline `schema: ` in FM | When (1) absent and `Meta["schema"]` is a known name | -| 3 | The collection's object check | When (1) and (2) absent | -| - | None | The item simply runs no object check | - -Command-line beats inline beats config because that orders the sources from most -specific intent to most general: the flag is the operator's override for this -run, the file's author has the most local information about what the file is, -and the collection is the bulk-association default. Markdown and filesystem -checks are *not* subject to this precedence; they always come from the -collection, since they describe the item's place in the project rather than its -object shape. - -Resolution runs through a per-invocation **resolver** that owns this policy and a -compiled-schema cache keyed by absolute path, so "check 10,000 files against the -same schema" costs one compile. - -## Why variants discriminate by metadata, not path - -A collection's `variants:` run extra checks on a subset of items, chosen by the -item's metadata. The discriminator (`when`) reuses the `item list --filter` -predicate grammar (`internal/storage/collection/predicate`), validated at load -via `predicate.Parse` so a bad expression fails fast. A variant's `schema:` -folds into a leading object check exactly like a collection's, so the engine -compiles base and variant through one path. - -The discriminator is metadata, not a glob, on purpose: metadata is the one -property every item yields on every backend (frontmatter for a file, columns for -a future row), so routing stays portable and the engine never depends on the -storage type. Selecting by *path* is a storage-type-scoped condition, deferred. -(The storage layer covers [how variants route checks rather than -membership]({{< relref "storage.md" >}}).) - -## Why a file inside a collection must match - -A file that sits inside a collection's directory but does not match its -`pattern` is reported as an **error**, not silently skipped. Silent skips hide -config drift: a typo'd pattern or a misfiled document would simply disappear -from validation. Opt-outs (`--allow-unmatched` and a config knob) are deferred -until real usage shows the need. The storage layer frames the same decision as -[unmatched references being first-class]({{< relref "storage.md" >}}). - -## Why named collections replaced the old `rules:` list - -Earlier versions used a flat, ordered `rules:` list of `{paths: , schema: -}` pairs, where the *first matching glob wins*. Named collections replaced -it for three reasons: - -- **Identity.** A collection has a name, so commands can address it (`check - books`, `item list books`). An anonymous glob rule cannot be named or - selected. -- **No precedence puzzles.** Glob ordering made the active rule for a file - depend on the order of unrelated entries. A file now belongs to exactly one - collection - the one whose directory contains it - so there is no "first match - wins" to reason about. -- **More than schemas.** A collection carries a whole `checks:` list (markdown - and filesystem checks, not just an object schema), which the old `{paths, - schema}` shape could not express cleanly. - -The `schema: ` shorthand is the one piece of the old model that survived: -sugar for a single leading `object` check. - -## Lifecycle of `check` - -The data flow per item, end to end: - -1. **Load config** (or take the `--schema` flag). Discover the `.katalyst/` - directory from the working directory; failing to find one is a usage error. -2. **Resolve selectors to items.** No selector means every collection; - `` means all its items; `/` means one. Files - inside a collection directory that do not match its `pattern` are unmatched - references (errors). -3. **Read file bytes.** Read errors are reported per item but do not abort the - run; exit-1 status accumulates. -4. **Parse frontmatter.** Malformed YAML/TOML/JSON is a per-item failure; no - frontmatter is itself an error. -5. **Resolve the object schema** via the precedence above, then **strip the - `schema:` directive** so user schemas with `additionalProperties: false` - are not tripped by katalyst's own metadata. -6. **Build the check list** from the resolved object check plus the collection's - markdown and filesystem checks. -7. **Run checks** (see [Checks]({{< relref "checks.md" >}})). -8. **Format output**: `path:line: /pointer: message` per violation; valid items - print `path: OK`. - -## Invariants - -1. **Schema names are stable; paths can move.** The `.katalyst/` config is the - only place that maps names to paths. -2. **The `schema:` directive is katalyst metadata, not user data.** It - influences resolution but never reaches the validator. -3. **A collection owns its checks; an item belongs to one collection.** There is - no glob-ordering "first match wins" - an item's checks are those of the - collection whose directory contains it. -4. **Unmatched is an error, not a warning.** Silent skips hide config drift. - -## See also - -- The [configuration reference]({{< relref "../reference/configuration.md" >}}) - for the precise `.katalyst/` surface. -- The [storage layer]({{< relref "storage.md" >}}) for how a backend maps onto - collections, and the instance model. -- The [domain model]({{< relref "domain-model.md" >}}) for the cross-subsystem - entity map and invariants. -- `go doc ./internal/project` for the code-level contract. diff --git a/docs/content/deep-dives/core-concepts.md b/docs/content/deep-dives/core-concepts.md deleted file mode 100644 index 376edf5a..00000000 --- a/docs/content/deep-dives/core-concepts.md +++ /dev/null @@ -1,59 +0,0 @@ -+++ -title = "Core concepts" -weight = 30 -+++ - -# Core concepts - -> **Status: work in progress.** A deliberately abstract sketch. These concepts -> are not about `katalyst` specifically; katalyst is one instantiation among -> many. Expect revisions until they settle. - -The vocabulary katalyst reasons in, general enough to describe a Postgres table, -a directory of markdown files, and a MongoDB collection the same way, so the -abstractions built on top bridge them too. Each term's canonical definition -lives in the [glossary]({{< relref "../reference/glossary.md" >}}); this page -introduces the concepts and how they fit. For the katalyst-specific -instantiation, see the [domain model]({{< relref "domain-model.md" >}}). - -## The concepts - -- **Project** is the whole workspace katalyst operates over: a configured root - that binds one or more storage backends into named collections. Its - configuration is the **Config** (katalyst's is the `.katalyst/` directory); - the collections, items, checks, and inspectors below all live within a - project. It is the scope an empty selector addresses. -- **Storage** is a backend that holds data: a filesystem, a SQLite database, a - Postgres instance, an S3 bucket. Katalyst's realization is the - [storage layer]({{< relref "storage.md" >}}). -- **Collection** is a group of items sharing structure: a directory of similar - files, a relational table, a Mongo collection. See - [collections]({{< relref "collections.md" >}}). -- **Item** is one unit of data in a collection: a markdown file, a table row, a - Mongo document. -- **Attribute** is a named characteristic of an item: a column, a frontmatter - key, a response field, even its name or path. A key in a structured object - specifically is a **field**. -- **Operation** is something storage lets you do with its data: read, list, - aggregate, write, and eventually query. Which operations a backend supports is - the subject of [progressive operations]({{< relref "progressive-operations.md" >}}). -- **Check** asserts a condition on an item or its attributes and reports a - violation when it fails. See [checks]({{< relref "checks.md" >}}). -- **Inspector** is the descriptive dual of a check: it measures a distribution - and returns evidence, never a verdict. See - [inspectors]({{< relref "inspectors.md" >}}). - -## The same vocabulary across backends - -| System | Storage | Collection | Item | Attribute | -|----------------------|---------------|-----------------|------------|------------------| -| Postgres | The database | A table | A row | A column | -| MongoDB | The database | A collection | A document | A field | -| A directory of CSVs | The directory | A CSV file | A row | A column | -| A REST API | The API | A resource type | A resource | A response field | -| An S3 bucket of JSON | The bucket | A key prefix | An object | A JSON key | - -An operation defined once in this vocabulary, check an attribute, aggregate over -a collection, applies to every backend that supports it. Which operations a -backend supports, and the structural commitments each demands, is the -[progressive operations]({{< relref "progressive-operations.md" >}}) story. diff --git a/docs/content/deep-dives/domain-model.md b/docs/content/deep-dives/domain-model.md deleted file mode 100644 index 7c3bd7f3..00000000 --- a/docs/content/deep-dives/domain-model.md +++ /dev/null @@ -1,114 +0,0 @@ -+++ -title = "Domain model" -weight = 40 -+++ - -# Domain model - -What `katalyst` is *about*: the concepts it manipulates and how they relate. -This is the conceptual map and the entry point to the subsystem deep-dives - -each piece is summarized here and documented in full on its own page. - -This page is the katalyst-specific map; [core concepts]({{< relref "core-concepts.md" >}}) -is the same map at the general, tool-agnostic altitude (the same vocabulary -applied to a Postgres table or a MongoDB collection). For *what* the commands do, -see the [getting-started tutorial]({{< relref "../getting-started.md" >}}) and the -[configuration reference]({{< relref "../reference/configuration.md" >}}). - -## At a glance - -```mermaid -flowchart LR - subgraph Disk["On disk"] - MD["Markdown file
(frontmatter + body)"] - SF["Schema file
(JSON Schema)"] - CF[".katalyst/
(config)"] - end - - subgraph Parsed["In memory"] - DOC["Document
Meta + Body + Lines"] - SCH["Schema
(compiled)"] - CFG["Config
Schemas + Collections"] - end - - subgraph Decide["Schema selection"] - FLAG["--schema flag"] - INLINE["inline 'schema:' key"] - COL["collection's object check"] - RES["Resolver"] - end - - MD --> DOC - SF --> SCH - CF --> CFG - - FLAG --> RES - INLINE --> RES - COL --> RES - CFG --> COL - - DOC -- "Meta (minus schema directive)" --> VAL["Run checks"] - RES -- "selected Schema" --> VAL - VAL --> RESULT["Result
(OK + Violations)"] - DOC -- "Lines" --> RESULT -``` - -## The pieces - -Each entity is summarized here; follow the link for its full treatment. - -**Content and parsing** - see [Frontmatter and fix]({{< relref "formatting.md" >}}): - -- **Markdown document** - a file's frontmatter (`Meta`) plus its body, parsed - into a `Document` with source-line tracking. - -**Configuration and collections** - see [Collections]({{< relref "collections.md" >}}): - -- **Config** - the loaded `.katalyst/` directory: which schemas exist and what - each collection checks. -- **Collection** - a named, directory-backed group of items that owns a set of - checks. **Item** - one file in it, addressed by a **Selector** - (`/`). -- **Schema** - a named JSON Schema describing an item's legal `Meta`. The - **schema directive** (`schema:` in frontmatter) opts a document into a schema, - and the **Resolver** picks the applicable one by the three-tier precedence. - -**Checks and inspectors**: - -- **Check** and **CheckLibrary** - a check asserts one condition; a library - provides and runs it. The product is a **validation result**: a flat list of - violations, or `path: OK`. See [Checks]({{< relref "checks.md" >}}). -- **Inspector** - the descriptive dual of a check, reporting the distribution a - check would assert against. See [Inspectors]({{< relref "inspectors.md" >}}). - -## Lifecycles - -- **`check`** resolves each item's schema and check list and runs them; the - end-to-end flow is in [Collections]({{< relref "collections.md" >}}). -- **`fix`** rewrites frontmatter into canonical form without touching the body - - see [Frontmatter and fix]({{< relref "formatting.md" >}}). - -Each subsystem page lists its own invariants; the repo-wide engineering rules -(such as "production code lives in `internal/`") are in the root `AGENTS.md`. - -## Vocabulary - -The canonical definitions of frontmatter, metadata, schema, collection, item, -selector, check, and the rest live in the -[glossary]({{< relref "../reference/glossary.md" >}}). Use those terms -consistently in code, docs, and user-facing copy. - -## Out of scope (today) - -Absences worth being explicit about; they shape what katalyst currently is -*not*: - -- **Relations between items.** A schema constrains one item at a time; no - cross-item `$ref`, no foreign keys. Planned. -- **Schema evolution.** No "this field was renamed in v2" migrations. Planned. -- **Query.** Katalyst has `item list` filters and sort keys for one collection, - implemented as an in-memory listing pipeline. A first-class storage query - operation, "find all docs where year > 1980" pushed into the backend, is - planned. -- **Derived state.** `.katalyst/` holds only hand-authored config; nothing is - generated into it. Every run is stateless. diff --git a/docs/content/deep-dives/domain-model/_index.md b/docs/content/deep-dives/domain-model/_index.md new file mode 100644 index 00000000..f40de4f2 --- /dev/null +++ b/docs/content/deep-dives/domain-model/_index.md @@ -0,0 +1,43 @@ ++++ +title = "Domain model" +weight = 30 +bookCollapseSection = true ++++ + +# Domain model + +This page introduces core concepts in the Katalyst domain model and how they relate to each other. + +## Bases + +The most central concept in Katalyst is a **base**: a storage system that holds **content** (data) and supports a specific set of **operations**. Katalyst is compatible with several different types of backend: filesystems, key-value stores, relational databases, etc. + +An **operation** is something a base lets you do with data: read, list, +aggregate, write, and eventually query. Which operations a base supports, +and what structural commitments those operations require, is the subject of +[progressive operations]({{< relref "../progressive-operations.md" >}}). + +In addition to natively-supported operations for various backends, Katalyst provides two very useful kinds of operation. + +- A **check** makes an assertion about content and reports a violation if the condition fails. See + [Checks]({{< relref "checks.md" >}}). +- An **inspector** is the descriptive dual of a check: it gathers and reports the state of content. See + [Inspectors]({{< relref "inspectors.md" >}}). + +Domain model diagram showing project containing base, collection, item, and attribute, with checks and inspectors operating on the data model. + +## Raw vs collection-configured bases + +When configuring a base, the most important division is between **raw content** and **collectionized content**. A base configured only for raw content supports only a limited set of operations: checks, inspections and a small set of fixes. Most operations that require writes are not permitted, because the system would not have the context necessary to guarantee that the new content is correct. + +When a base is configured with **collections**, it can guarantee correctness and consistency for more operations. Check and inspect operations can be more specific and context-aware. Far more write operations are available, since the system now has more context to enable correctness and consistency. + +Within a given base, collection configs do not replace raw configs. Instead, they stack on top. Similarly, operations that require a collection stack on top of those available when the base was only configured for raw access to content. + +## Projects + +A **project** is the whole workspace Katalyst operates over: a configured root that includes one or more bases, plus some additional metadata. diff --git a/docs/content/deep-dives/domain-model/base.md b/docs/content/deep-dives/domain-model/base.md new file mode 100644 index 00000000..92ba87d3 --- /dev/null +++ b/docs/content/deep-dives/domain-model/base.md @@ -0,0 +1,130 @@ ++++ +title = "Bases" +weight = 40 ++++ + +# Bases + +A **base** is how Katalyst reaches a backend source and maps that source into the domain model. + +Every base must include configuration for **raw** access. Raw access gives Katalyst a stable way to locate content in the source. For a filesystem, that can be a root directory. For SQL, that can be connection information for a specific instance. + +A **collectionized** base keeps that raw access and adds collection definitions. Those definitions map base-native references into named collections and item identities that Katalyst commands can address directly. This is where two-way mapping applies. + +Katalyst's base model covers filesystem and SQLite backends today and is designed to extend to backends such as Postgres, S3, and hosted APIs. + +## Terms + +The base model uses several named pieces: + +| Term | Meaning | +|---|---| +| **Base type** | A known backend source kind capable of holding collections and items: `filesystem` and `sqlite` today; `postgresql`, `mongodb`, and others later. | +| **Base instance** | A specific, connectable instance of a base type, plus the information needed to reach it. | +| **Collection mapping** | The two-way mapping from a base instance's contents to collections and items. One mapping may yield more than one collection. | +| **Base reference** | A base-native locator: a file path, S3 key, table name, or similar backend address. | +| **Coordinates** | The captured fields that identify a unit within its collection. | +| **Scope** | The domain level, item or collection, at which a base type attaches a base's units to the model. | + +In config, a base instance declares the collections it maps, and the instance file is where the collection mapping lives. In code, the implementation seam is `internal/storage/collection.CollectionDefinition`; `internal/project` consumes it rather than implementing the filesystem mapping inline. + +Base readers use codecs to decode a matched unit's content into the shape checks and inspectors consume. The markdown filesystem reader uses `internal/codec/markdownbodytext` for frontmatter/body parsing; codecs are shared content adapters, not base backends. + +## Collectionized bases use a two-way mapping + +When a base is collectionized, mapping has two directions: + +- **Forward (discovery):** `path -> match pattern -> captured groups` become + the unit's *coordinates*. +- **Reverse (reconstruction):** `coordinates -> fill a template -> path`. + +The reverse direction is **not optional**. Katalyst needs it the moment +`item add notes/dune` has to decide what file to create; that is the same +path-reconstruction problem. Today it is the degenerate, stem-only case +(`Reference(c, id) -> /.md`); it grows with the layout. + +## The scope principle + +**"What does one matched source unit become?" has no global answer; it is a property each base type declares.** + +- **Markdown filesystem:** one file = one **item**; a directory of files = + a **collection**. +- **Tabular (CSV / SQL):** one file/table = one **collection**; its rows = + **items**. + +Both mappings are correct because item and collection are domain roles, not file counts. The definition absorbs that difference: alongside the path-to-coordinates mapping, it declares the **scope** at which a base's units attach to the collection/item hierarchy. + +Implication: **item and collection are roles, not file counts.** A base that packs many items into one physical unit (rows in a table) and one that spreads a single item across a whole unit (a markdown file) are both valid. + +## Base capability stack + +- **Raw base:** Katalyst can connect to the source and reference base-native content. +- **Collectionized base:** a raw base plus collection definitions that map base-native references into domain collections and items. + +## Unmatched references are first-class + +Katalyst treats unmatched references as errors rather than silently dropping them. A file inside a configured collection's scope that matches no pattern is usually a signal of config drift: the pattern is wrong, the file is misplaced, or the project has gained a new shape that has not been modeled yet. + +The same evidence can power a future `doctor` / `explain` command: list the collections, show representative examples, and surface the base references that matched nothing. + +## Variants route checks, not membership + +A collection may run different checks on different items via +[variants]({{< relref "../../reference/configuration.md" >}}#variants), but that is +a *check-engine* concern, not a base one. A variant's discriminator is a +predicate over an item's **metadata**: portable across every base type, since +each yields a metadata map (frontmatter for a file, columns for a row). It never +touches the seam: membership, `Unmatched`, and `Reference` stay governed by the +definition's `pattern`. Discriminating by *path* would be a base-type-scoped +condition; it is deferred precisely to keep the seam closed for now. + +## Coordinates are the selector + +In Katalyst, the flat `stem` identity is the degenerate one-coordinate case: +`notes/dune.md` becomes the item id `dune`. Richer layouts (`notes/2020/dune`) +grow into multiple coordinates parsed from the path. The selector grammar and +the definition's pattern are two views of the same thing. + +## Design rationale + +- **The contract is two-way, not one-way.** Discovery and reconstruction are + both core base operations. +- **Raw and collectionized are one progression.** A base starts with + base-native references, then gains collection definitions that make + collection-aware operations possible. +- **Surface unmatched references.** Silent skips hide drift between the + base's real contents and the configured model. +- **Coordinates and selectors are one concept.** The fields captured from a + base reference should be the same fields users and agents use to address + the item. +- **Prefer an inherently two-way template** (`{name}_{year}.md`) over inverting + an arbitrary regex. A template is bidirectional by construction; a regex is + not. +- **The pattern must own the file extension**, or reconstruction is ambiguous + when several extensions are allowed. +- **Keep collection identity separate from within-collection coordinates.** + Collection names and item coordinates answer different questions and should + stay distinct. + +## Extension points + +- **Core seam:** `internal/storage` defines `BaseType`, `Scope`, and + `Reference`; `internal/project` assembles `BaseInstance` values, and + `internal/storage/collection` defines `CollectionDefinition`. The filesystem + implementation maps a directory to a collection and each `*.md` file to an + item with a stem id; the SQLite implementation maps a table to a collection + and each row to an item. +- **Extension point:** anything that turns a base-native reference into an item + identity (or back) passes through `CollectionDefinition`, so backends can be + added without touching the check engine, the CRUD verbs, or selector parsing. + Multi-coordinate templates, inferred mode, and additional non-filesystem + types slot in there. + +## See also + +- [Domain model]({{< relref "_index.md" >}}) for the cross-subsystem entity map. +- [Collections]({{< relref "collections.md" >}}) for the collection and item + hierarchy bases expose. +- [Configuration]({{< relref "../../reference/configuration.md" >}}) for the + precise `.katalyst/bases/` surface. +- `go doc ./internal/storage` for the code-level base contracts. diff --git a/docs/content/deep-dives/checks.md b/docs/content/deep-dives/domain-model/checks.md similarity index 68% rename from docs/content/deep-dives/checks.md rename to docs/content/deep-dives/domain-model/checks.md index 0d9a2d8a..bdcbb314 100644 --- a/docs/content/deep-dives/checks.md +++ b/docs/content/deep-dives/domain-model/checks.md @@ -11,50 +11,33 @@ verdict on each item: it resolves the checks that apply, runs them, and collects their violations. This page explains the model the engine is built on, how check libraries supply and run checks, and why the pieces are shaped the way they are. For the per-type catalog see the [check types -reference]({{< relref "../reference/check-types/_index.md" >}}); for the +reference]({{< relref "../../reference/check-types/_index.md" >}}); for the end-to-end data flow of one `check` invocation see the [domain -model]({{< relref "domain-model.md" >}}). - -## The model - -Four distinctions carry the whole engine. Keep them separate and the rest -follows. - -**Check type vs. check instance.** A *check type* is the reusable definition of -a constraint (`object_required_field`, `markdown_single_h1`), selected by its -`kind:` id. A *check instance* is one check type configured on a collection: a -`kind` plus its arguments, one entry under `checks:`. The type is the rule; the -instance is the rule applied here. - -**The registry is the single source of truth.** Each check type lives in its own -file and self-registers a `Descriptor` (its id, family, docs metadata) and a -constructor from an `init()`. `cmd/engine` builds the runnable list by registry -lookup; the docs generator and `katalyst check-types list` read the same -descriptors. A parity test fails if a configured kind has no descriptor, so a -check type cannot ship undocumented. Adding one touches a single file, not a -central switch. - -**Family vs. library.** A check type's *family* is the kind of source data it -reads: `structuredObject` (frontmatter), `markdownBodyText` (the body), -`fileSystem` (names and paths), `plainText` (raw body text). A check type's -*library* is the provider that supplies and runs it (below). The two are -orthogonal, and a single family spans libraries: `structuredObject` holds both -`object` (the JSON Schema library) and `object_required_field` (the native -structured-object library). Family answers *what data*; library answers *who -runs the engine*. - -**Granularity.** Most checks run once per item, implementing `Run(Context) -[]Violation`. A few reason across an entire collection (uniqueness, a required -index file) and implement `RunCollection(CollectionContext) []Violation`: they -run once per collection, after the per-item pass, so even a single-item selector -re-scans every sibling (a uniqueness verdict is only correct against the whole -set). Granularity is independent of family: `unique_field` is collection-scoped -and `structuredObject`; `unique_filename` is collection-scoped and `fileSystem`. - -A check also carries a **severity**. The default is `error`, which fails the -run; `warning` is advisory and never changes the exit code. Warnings exist for -judgment-call checks (prose tells, style nits) where a human decides per -instance. +model]({{< relref "_index.md" >}}). + +## Terms + +| Term | Meaning | +|---|---| +| **Check** | Shorthand for a check instance when context is unambiguous. A check asserts one condition and reports a violation when the condition fails. | +| **Check type** | The reusable definition of a constraint: `object_required_field`, `markdown_single_h1`, and so on. A check type is selected by its `kind:` id and appears in the generated check types reference. | +| **Check instance** | One configured check attached to a collection: a check type plus its arguments, written as one YAML object under `checks:`. The type is the rule; the instance is the rule applied here. | +| **Family** | The kind of source data a check type reads: `structuredObject` (frontmatter), `markdownBodyText` (the body), `fileSystem` (names and paths), or `plainText` (raw body text). | +| **Check library** | The provider that supplies and runs a check type. Native libraries wrap hand-written checks; schema-backed libraries delegate to an external validation engine. | +| **Scope** | The level where a check runs. Most checks are item-scoped; a few are collection-scoped and reason across every item in the collection. | +| **Severity** | The consequence of a violation. `error` fails the run; `warning` is advisory and does not change the exit code. | +| **Violation** | One failed check result, with a message, source location, JSON pointer when applicable, severity, and sometimes a sibling file for collection-scoped findings. | + +Family and library are separate axes. Family answers *what data does this check +read?* Library answers *who runs it?* A single family can span libraries: +`structuredObject` includes both `object` from the JSON Schema library and +`object_required_field` from the native structured-object library. + +The registry is the single source of truth for check types. Each check type +self-registers a `Descriptor` (its id, family, docs metadata) and a constructor. +`cmd/engine` builds the runnable list by registry lookup; the docs generator and +`katalyst check-types list` read the same descriptors. A parity test fails if a +configured kind has no descriptor, so a check type cannot ship undocumented. ## Check libraries @@ -106,11 +89,11 @@ item, the simplest correct path. Per item, the engine resolves which checks apply, then runs them. Resolution starts from the collection's configured checks and adds the checks of -the first [variant]({{< relref "../reference/configuration.md" >}}) whose `when` +the first [variant]({{< relref "../../reference/configuration.md" >}}) whose `when` predicates the item's metadata satisfies. The object schema is selected by a precedence the JSON Schema library owns (a forced `--schema`, then an inline `schema:` directive, then the collection's object checks); see the [domain -model]({{< relref "domain-model.md" >}}) for the precedence table and the +model]({{< relref "_index.md" >}}) for the precedence table and the full per-item lifecycle. Before any schema compiles, the engine confirms the owning libraries are available. @@ -167,15 +150,24 @@ into one invocation is the optimization, deferred to [#68](https://github.com/abegong/katalyst/issues/68) rather than built before a real out-of-process library exists. +## Invariants + +1. **The registry is authoritative.** Every runnable check type has a + descriptor, and generated docs read the same registry as the engine. +2. **Family and library stay separate.** Family describes the data a check + reads; library describes the provider that runs it. +3. **Collection-scoped checks see the whole collection.** A selector may narrow + output, but a collection-level verdict still needs the full sibling set. + ## See also -- The [check types reference]({{< relref "../reference/check-types/_index.md" >}}) +- The [check types reference]({{< relref "../../reference/check-types/_index.md" >}}) for the precise per-type surface, generated from the registry. -- The [domain model]({{< relref "domain-model.md" >}}) for the per-`check` +- The [domain model]({{< relref "_index.md" >}}) for the per-`check` lifecycle, the schema resolver, and the validation result. -- The [glossary]({{< relref "../reference/glossary.md" >}}) for the canonical +- The [glossary]({{< relref "../../reference/glossary.md" >}}) for the canonical terms (check type, check instance, CheckLibrary, schema, violation). -- The [storage layer]({{< relref "storage.md" >}}) for the collection and item +- The [base]({{< relref "base.md" >}}) for the collection and item identities checks run against, and the inspector that is a check's descriptive dual. - `go doc ./internal/checks` for the code-level engine contract. diff --git a/docs/content/deep-dives/domain-model/collections.md b/docs/content/deep-dives/domain-model/collections.md new file mode 100644 index 00000000..3cb31137 --- /dev/null +++ b/docs/content/deep-dives/domain-model/collections.md @@ -0,0 +1,129 @@ ++++ +title = "Collections" +weight = 42 ++++ + +# Collections + +The `internal/project` loader (`loader.go`) is the orchestration hub: it loads a +project's `.katalyst/` directory, resolves named schemas, and assembles bases +and their collections. Each object type parses its own config: the base +registry validates a declared `type`, and a collection parses its own block in +`storage/collection`. It decides which schema applies to a given item, and the +`check` lifecycle is driven from here. + +This page is the model and the *why*; for the key-by-key surface see the +[configuration reference]({{< relref "../../reference/configuration.md" >}}). + +Collections are declared *inside* a [base]({{< relref "base.md" >}}), which owns +the base-to-collection mapping. This page covers the collection model and +schema resolution; the base page covers how a base maps a backend source onto +those collections. + +## Terms + +| Term | Meaning | +|---|---| +| **Collection** | A group of items that share structure: a directory of similar files, a relational table, a Mongo collection, or a family of API resources. Collections are the unit that owns checks and that users address by name. | +| **Item** | One unit of data in a collection: a markdown file, a table row, a Mongo document, or one API resource. In the filesystem base, an item is one file matching the collection's `pattern`; its **id** is the filename stem (`notes/books/dune.md` gives `dune`). | +| **Attribute** | A named characteristic of an item: a column, a frontmatter key, a response field, its filename, its path, or another backend-derived property. A key in a structured object specifically is a **field**. | +| **Selector** | How commands (`check`, `fix`, the `item` subcommands) name what to operate on, broad to narrow: *(none)* is the whole project, `` is one collection, `/` is a single item. | +| **Schema** | A JSON Schema (draft 2020-12 by default) describing the legal shape of an item's parsed `Meta`. A schema has two identities: a **path** on disk and a **name** (its filename stem under `.katalyst/schemas/`). The name is the stable public handle; paths can change. `--schema ` bypasses the name layer entirely. | +| **Schema directive** | A per-document `schema:` frontmatter key that opts the document into a specific schema. It is **metadata about katalyst, not user data**: the resolver reads it to choose a schema, then strips it from `Meta` before validating, so a schema with `additionalProperties: false` is not tripped by katalyst's own key. | + +The `config.Config` loaded from disk is the single source of truth for "what +schemas exist and what each collection checks." It is validated at load: every +collection's object schema must reference a known schema, and a collection must +configure at least one check via the `schema:` shorthand or an explicit +`checks:` list. + +## Collections across backends + +The collection model is intentionally broader than "a directory of markdown +files." A collection is the named group Katalyst can list, select, inspect, and +check, even when the backing base has a different native vocabulary. + +| System | Base | Collection | Item | Attribute | +|----------------------|---------------|-----------------|------------|------------------| +| Postgres | The database | A table | A row | A column | +| MongoDB | The database | A collection | A document | A field | +| A directory of CSVs | The directory | A CSV file | A row | A column | +| A REST API | The API | A resource type | A resource | A response field | +| An S3 bucket of JSON | The bucket | A key prefix | An object | A JSON key | + +An operation defined against this vocabulary, such as checking an attribute or +aggregating over a collection, applies to every base that can support it. The +base still decides the mechanics: a filesystem may list files and parse +frontmatter in memory, while a database may push filtering and aggregation into +queries. The collection name stays the user's handle either way. + +## Design rationale + +**Schema resolution has three tiers.** + +When `check` validates an item against an object schema, it resolves which +schema, highest precedence first: + +| # | Source | When it wins | +|---|-------------------------------|--------------| +| 1 | `--schema ` flag | Always, for every item in the invocation | +| 2 | Inline `schema: ` in FM | When (1) absent and `Meta["schema"]` is a known name | +| 3 | The collection's object check | When (1) and (2) absent | +| - | None | The item simply runs no object check | + +Command-line beats inline beats config because that orders the sources from most +specific intent to most general: the flag is the operator's override for this +run, the file's author has the most local information about what the file is, +and the collection is the bulk-association default. Markdown and filesystem +checks are *not* subject to this precedence; they always come from the +collection, since they describe the item's place in the project rather than its +object shape. + +Resolution runs through a per-invocation **resolver** that owns this policy and a +compiled-schema cache keyed by absolute path, so "check 10,000 files against the +same schema" costs one compile. + +**Variants discriminate by metadata, not path.** + +A collection's `variants:` run extra checks on a subset of items, chosen by the +item's metadata. The discriminator (`when`) reuses the `item list --filter` +predicate grammar (`internal/storage/collection/predicate`), validated at load +via `predicate.Parse` so a bad expression fails fast. A variant's `schema:` +folds into a leading object check exactly like a collection's, so the engine +compiles base and variant through one path. + +The discriminator is metadata, not a glob, on purpose: metadata is the one +property every item yields on every backend (frontmatter for a file, columns for +a future row), so routing stays portable and the engine never depends on the +base type. Selecting by *path* is a base-type-scoped condition, deferred. The +base page covers [how variants route checks rather than membership]({{< relref "base.md" >}}). + +**Files inside a collection must match.** + +A file that sits inside a collection's directory but does not match its +`pattern` is reported as an **error**, not silently skipped. Silent skips hide +config drift: a typo'd pattern or a misfiled document would simply disappear +from validation. Opt-outs (`--allow-unmatched` and a config knob) are deferred +until real usage shows the need. The base page frames the same decision as +[unmatched references being first-class]({{< relref "base.md" >}}). + +## Invariants + +1. **Schema names are stable; paths can move.** The `.katalyst/` config is the + only place that maps names to paths. +2. **The `schema:` directive is katalyst metadata, not user data.** It + influences resolution but never reaches the validator. +3. **A collection owns its checks; an item belongs to one collection.** There is + no glob-ordering "first match wins" - an item's checks are those of the + collection whose directory contains it. +4. **Unmatched is an error, not a warning.** Silent skips hide config drift. + +## See also + +- The [configuration reference]({{< relref "../../reference/configuration.md" >}}) + for the precise `.katalyst/` surface. +- The [base]({{< relref "base.md" >}}) for how a backend source maps onto + collections, and the base model. +- The [domain model]({{< relref "_index.md" >}}) for the cross-subsystem + entity map and invariants. +- `go doc ./internal/project` for the code-level contract. diff --git a/docs/content/deep-dives/formatting.md b/docs/content/deep-dives/domain-model/fix.md similarity index 54% rename from docs/content/deep-dives/formatting.md rename to docs/content/deep-dives/domain-model/fix.md index bd608439..64a99bf3 100644 --- a/docs/content/deep-dives/formatting.md +++ b/docs/content/deep-dives/domain-model/fix.md @@ -1,59 +1,28 @@ +++ -title = "Frontmatter and fix" -weight = 60 +title = "Fix" +weight = 62 +++ -# Frontmatter and fix +# Fix -How Katalyst parses a markdown file's frontmatter, the in-memory document that -produces, and why [`fix`]({{< relref "../reference/cli.md" >}}) rewrites -that frontmatter the opinionated way it does. The codec (parse and encode) lives -in `internal/codec/markdownbodytext`; the `fix` transform that drives the -canonical form, and the backend write that persists it, live in `internal/fix` -and `internal/storage/collection/filesystem` respectively. +Why [`katalyst fix`]({{< relref "../../reference/cli.md" >}}) rewrites +frontmatter the opinionated way it does. The parser and encoder live in +`internal/codec/markdownbodytext`; the transform that drives the canonical +form, and the backend write that persists it, live in `internal/fix` and +`internal/storage/collection/filesystem` respectively. -## The markdown document +## Terms -The unit of work is a file on disk with two optional regions: +| Term | Meaning | +|---|---| +| **Fix** | A command that rewrites existing content into Katalyst's canonical form when a check can supply a safe transformation. | +| **Canonical form** | The deterministic output format `fix` writes: preserved frontmatter syntax, sorted top-level keys, native encoder style, preserved body bytes, and one trailing newline. | +| **Report-only check** | A check that can report violations but cannot safely rewrite content. | +| **Check mode** | The `--check` form of `fix`: print what would change, write nothing, and exit 1 if any item is non-canonical. | -- A **frontmatter** block at the very top of the file, in one of three formats - detected by the opening fence: +## Design rationale - | Format | Fence | Example openers | - |--------|-------|-----------------| - | YAML | `---` | Jekyll, Obsidian, Hugo | - | TOML | `+++` | Hugo, Obsidian, Jekyll | - | JSON | `{` ... `}` | Hugo | - - These are the three formats Hugo, Obsidian, and Jekyll emit. Whatever the - source format, the parsed `Meta` is a plain `map[string]any`, so checks and - inspectors never branch on format. `Document.Format` records the detected - syntax so `fix` can re-emit a file in its own format rather than rewriting, - say, TOML as YAML. -- A **body**, everything after the closing fence. - -A document *may* have no frontmatter, in which case `check` reports it as an -error (the file claimed no metadata, so we couldn't check anything). - -When parsed, a markdown document becomes a `markdownbodytext.Document`: - -| Field | Meaning | -|------------------|---------| -| `HasFrontmatter` | Did the file open with a recognized fence? | -| `Format` | Detected syntax: `KindYAML`, `KindTOML`, or `KindJSON` | -| `Meta` | Parsed frontmatter, normalized to `map[string]any` | -| `Body` | Bytes after the closing fence, **never modified** except by `fix` | -| `Lines` | JSON-pointer-path to 1-indexed source line | - -The `Lines` index is what makes error messages locatable. It accounts for the -opening fence offset, so `Lines["/title"] = 2` means the `title` key is on line -2 of the original file. - -**Line tracking is full for YAML only.** For TOML and JSON, `Lines` is empty -today; checks degrade gracefully (they emit the error without a line number). -Richer line tracking for the other formats is a planned follow-up. - -## Why fix is deliberately opinionated +**Fix is deliberately opinionated.** `katalyst fix` rewrites frontmatter in one canonical form **in the file's own format**: TOML stays TOML, JSON stays JSON, YAML stays YAML. `fix` never @@ -84,11 +53,7 @@ if it hurts in practice. `--check` makes `fix` non-destructive: it writes nothing, prints the items that *would* change, and exits 1. That is the CI form. -## Worked example - -{{< katalyst-example-full "fix-normalize-frontmatter" >}} - -## Why fix never injects missing values +**Fix never injects missing values.** An earlier idea had a mode that would add "sentinel" placeholder values for missing required keys. It was dropped, and the safe-mutation story moved to a @@ -102,6 +67,10 @@ that. A safer design, interactive or constrained to filling a schema's declared `fix` only ever normalizes what is already there; it never creates structure (a frontmatter-less file is returned untouched). +## Worked example + +{{< katalyst-example-full "fix-normalize-frontmatter" >}} + ## Lifecycle of fix For each item: @@ -123,8 +92,13 @@ For each item: 1. **Body bytes are sacred.** No command except `fix` modifies them. Even `fix` only normalizes trailing whitespace and the leading separator; interior body bytes round-trip exactly. -2. **Line numbers are file-relative and 1-indexed.** The opening fence is line - 1, so the first key is typically line 2. (Populated for YAML today; see the - line-tracking note above.) -3. **Format is preserved.** `fix` re-emits each file in its own frontmatter +2. **Format is preserved.** `fix` re-emits each file in its own frontmatter syntax and never converts between YAML, TOML, and JSON. +3. **No semantic values are invented.** `fix` only normalizes existing + frontmatter and configured text fixes; it does not create missing metadata. + +## See also + +- [Markdown body text]({{< relref "../../reference/data-surfaces/markdown-body-text.md" >}}) + for how markdown documents parse before `fix` rewrites them. +- `go doc ./internal/fix` for the code-level transform contract. diff --git a/docs/content/deep-dives/inspectors.md b/docs/content/deep-dives/domain-model/inspectors.md similarity index 62% rename from docs/content/deep-dives/inspectors.md rename to docs/content/deep-dives/domain-model/inspectors.md index 58bbca46..9db0f6d0 100644 --- a/docs/content/deep-dives/inspectors.md +++ b/docs/content/deep-dives/domain-model/inspectors.md @@ -9,20 +9,29 @@ An **inspector** profiles content and returns *evidence*: counts and distributions, never recommendations. Inspectors are the descriptive dual of [checks]({{< relref "checks.md" >}}) - a check asserts a predicate and reports violations; an inspector reports the distribution that predicate would be tested -against. They drive the [`inspect`]({{< relref "../reference/cli.md" >}}) +against. They drive the [`inspect`]({{< relref "../../reference/cli.md" >}}) command. For the per-inspector catalog see the [inspectors -reference]({{< relref "../reference/inspectors/_index.md" >}}); this page is the +reference]({{< relref "../../reference/inspectors/_index.md" >}}); this page is the model and the rationale behind it. -## Two layers +## Terms + +| Term | Meaning | +|---|---| +| **Inspector** | A read-only operation that profiles content and returns evidence. | +| **Evidence** | The measured counts, distributions, classes, or summaries an inspector reports. Evidence is not a recommendation or verdict. | +| **Raw base layer** | Inspectors that measure a base directly before collection configuration. | +| **Collection layer** | Inspectors that measure configured collection items by domain identity. | +| **Measurement primitive** | A reusable profiler such as `objectFields`, `markdownBody`, `fileMetadata`, or content-shape parsing that inspectors point at a specific input. | + +## Model Inspectors come in two layers, distinguished by *how they reference the data*: -- **The raw-source layer** (`SourceInspector` over a `SourceView`) measures a - backend store directly, before any collection configuration, addressed by - backend-native reference (a relative path today). It answers "what is in this - store?" - the onboarding case. `file_tree` and `file_content_shape` live - here. +- **The raw base layer** (`SourceInspector` over a `SourceView`) measures a + base directly, before any collection configuration, addressed by + base-native reference (a relative path today). It answers "what is in this + base?" - the onboarding case. `file_tree` and `file_content_shape` live here. - **The collection layer** (`CollectionInspector` over a `CollectionView`) measures a configured collection's items, addressed by domain identity (collection + item id) and reached through the project's @@ -31,9 +40,9 @@ Inspectors come in two layers, distinguished by *how they reference the data*: The two are **distinct interfaces, not one type at two scopes**, precisely because they reference the data through different machinery. This mirrors the -seam in the [storage layer]({{< relref "storage.md" >}}). +seam in the [base]({{< relref "base.md" >}}). -## Built from primitives +**Measurement is built from primitives.** Most measurement lives in three reusable, layer-agnostic primitives, so the inspectors themselves are thin wrappers that point a primitive at an input: @@ -48,11 +57,16 @@ inspectors themselves are thin wrappers that point a primitive at an input: shape (types, naming, depth, regions, directory density) over references, opening no files. -The same small primitives are reused where the layer makes sense, but raw-source -inspectors avoid proposing collections. They report store and content facts; a -human or agent decides what collection boundaries those facts imply. +The same small primitives are reused where the layer makes sense. +`file_content_shape` opens selected raw files and reports their frontmatter keys +and body structure; once those files belong to a collection, `object_fields` +measures item frontmatter by domain identity. Raw base inspectors still avoid +proposing collections: they report store and content facts, and a human or agent +decides what collection boundaries those facts imply. + +## Design rationale -## Evidence, not recommendations +**Evidence, not recommendations.** An inspector reports that a field appears in 94% of items; it does **not** say "make it required." The threshold that turns 94% into a required field, or a @@ -65,7 +79,7 @@ become something to second-guess rather than trust. Reporting only counts, with the unit count `n` as denominator, keeps the evidence trustable: the reader sees why a conclusion holds and decides. -## The determinism dividing line +**The determinism dividing line.** Deterministic measurement is an inspector's job; threshold-picking and structure-proposing are not. Counting field presence, histogramming types, @@ -74,7 +88,7 @@ all deterministic, all inspectors. Deciding that 94% is "required", that a directory should be a collection, or what to name a schema are all judgment, none of it here. -## Keeping output small +**Keep output small.** `file_tree` and `file_content_shape` keep Markdown output small with deterministic caps: small trees get an actual tree; content-shape reports show @@ -94,12 +108,21 @@ intended workflow is a loop - inspect, draft a schema, check, fix the holdouts - but the forming, drafting, and threshold-choosing live with whoever drives the tool, not in the engine. +## Invariants + +1. **Inspectors do not mutate content.** They report evidence and write no + schemas, checks, or files. +2. **Evidence stays separate from recommendations.** Threshold choices and + schema proposals belong to the human or agent driving the workflow. +3. **Layer boundaries stay explicit.** Raw base inspectors use base-native + references; collection-layer inspectors use collection and item identity. + ## See also -- The [inspectors reference]({{< relref "../reference/inspectors/_index.md" >}}) +- The [inspectors reference]({{< relref "../../reference/inspectors/_index.md" >}}) for the per-inspector surface, generated from the registry. - [Checks]({{< relref "checks.md" >}}) - the prescriptive dual; an inspector measures the distribution a check would assert against. -- [Core concepts]({{< relref "core-concepts.md" >}}) for where profiling sits in +- [Domain model]({{< relref "_index.md" >}}) for where profiling sits in the catalog-define-enforce loop. - `go doc ./internal/inspect` for the code-level engine contract. diff --git a/docs/content/deep-dives/storage.md b/docs/content/deep-dives/storage.md deleted file mode 100644 index 5f801274..00000000 --- a/docs/content/deep-dives/storage.md +++ /dev/null @@ -1,188 +0,0 @@ -+++ -title = "Storage layer" -weight = 40 -+++ - -# Storage layer - -> **Status: partly shipped.** The seam and the config model exist -> (`internal/storage`, storage instances under `.katalyst/storage/`); the -> richer mapping (multi-coordinate templates, inferred mode, non-filesystem -> backends) is still ahead. This page describes the whole arc, and notes what -> is built versus planned. - -## What the storage layer is - -The **storage layer** is the two-way mapping between a backend store and the -Katalyst domain model. It answers: *what collections and items does this store -contain, and where does each one live?*, in both directions. It is Katalyst's -realization of the general **storage** concept from -[core concepts]({{< relref "core-concepts.md" >}}): the filesystem is one -backend; SQLite, directories of CSVs, S3 buckets, and hosted APIs are others. -The first real stress test is **SQLite**, because it is the first backend that -forces the granularity question below. - -## Three concepts - -The layer is three named pieces, not one. Earlier drafts called the whole thing -a *connector*; that single word was doing two jobs, *how do I reach the store* -and *how does its content map to the model*, so it was split: - -| Concept | Meaning | -|---|---| -| **StorageType** | A known backend kind capable of holding collections and items: `filesystem` today; `sqlite`, `postgresql`, `mongodb` later. | -| **StorageInstance** | A specific, connectable instance of a StorageType, plus the information needed to reach it (for `filesystem`, a root directory). | -| **CollectionDefinition** | The two-way mapping from a StorageInstance's contents to collections and items. `FilesystemCollectionDefinition` is the first; one definition may yield **more than one** collection. | - -In config, a StorageInstance declares the collections it maps, the instance -file *is* where the CollectionDefinition lives (see -[Configuration]({{< relref "../reference/configuration.md" >}})). In code, the -seam is `internal/storage/collection.CollectionDefinition`; `internal/project` consumes it -rather than implementing the filesystem mapping inline. - -Storage readers use codecs to decode a matched unit's content into the shape -checks and inspectors consume. The markdown filesystem reader uses -`internal/codec/markdownbodytext` for frontmatter/body parsing; codecs are -shared content adapters, not storage backends. - -## Lineage: GX legacy DataConnectors - -The design is adapted from Great Expectations' V3 `DataConnector` layer -(recovered for reference *outside this repo*; originally GX commit -`6cd804579`, removed in `27eb8d28b`). A GX `DataConnector` defined a -`regex + group_names` naming convention that mapped each file/key in a store -to a `BatchDefinition`, plus the inverse mapping back to a path. GX's -`Datasource` (the store) versus `DataConnector` (the mapping) split is exactly -the StorageInstance versus CollectionDefinition split here. - -### The heart: a two-way mapping - -- **Forward (discovery):** `path → match pattern → captured groups` become - the unit's *coordinates*. -- **Reverse (reconstruction):** `coordinates → fill a template → path`. - -The reverse direction is **not optional**. Katalyst needs it the moment -`item add notes/dune` has to decide *what file to create*, that is the same -path-reconstruction problem. Today it is the degenerate, stem-only case -(`Reference(c, id) → /.md`); it grows with the layout. - -## Concept mapping: GX → Katalyst - -| GX (legacy V3) | Katalyst | -|----------------|----------| -| Datasource | **StorageInstance** (+ its StorageType) | -| **DataConnector** | **CollectionDefinition** | -| DataAsset (`data_asset_name`) | **Collection** | -| **Batch / BatchDefinition** | **Item** *(markdown)* / Collection *(tabular)*, see granularity | -| PartitionDefinition (`group_names` → values) | the item's **coordinates** (today: the stem) | -| BatchRequest / PartitionQuery | a **selector** (the [addressing] grammar) | -| BatchSpec | the resolved fetch instruction (a `Reference`, the file path) | -| Configured vs. Inferred | `check` (declared) vs. `infer` / `profile` (discovered) | - -### The granularity principle (locked) - -**"What does one matched store unit become?" has no global answer, it is a -property each StorageType declares for its backend.** - -- **Markdown filesystem:** one file = one **Item**; a directory of files = - a **Collection** (`Granularity` is `FileIsItem`). -- **Tabular (CSV / SQL):** one file/table = one **Collection**; its rows = - **Items** (`UnitIsCollection`). - -This is why a GX *Batch* maps to a Katalyst *Item* in the markdown world but -to a *Collection* in the tabular world, and both are correct. The definition -absorbs that impedance: alongside the path↔coordinates mapping, it declares the -**level** at which a store's units attach to the collection/item hierarchy. - -Implication: **Item and Collection are roles, not file counts.** A backend that -packs many items into one physical unit (rows in a table) and one that spreads a -single item across a whole unit (a markdown file) are both valid. - -## Two modes: Configured vs Inferred - -GX shipped both, and they map cleanly onto Katalyst verbs: - -- **Configured:** collections and their patterns are declared explicitly (the - instance's `collections:` block). This is the `check` path: known structure, - enforced. *Shipped.* -- **Inferred:** collection names and structure are *discovered* by applying - the pattern to whatever is in the store. This is the `infer` / `profile` - path: structure read out of the data. *Planned.* - -## Unmatched references are first-class - -GX tracked files that matched no pattern (`get_unmatched_data_references`) -rather than silently dropping them. Katalyst already treats unmatched as an -error. GX's `self_check`, "here are your -collections, some examples, and the files that matched nothing", is the -template for a future `doctor` / `explain` that diagnoses a definition's mapping. - -## Variants route checks, not membership - -A collection may run different checks on different items via -[variants]({{< relref "../reference/configuration.md" >}}#variants), but that is -a *check-engine* concern, not a storage one. A variant's discriminator is a -predicate over an item's **attributes**: portable across every StorageType, -since each yields a structured attribute object (frontmatter fields for a file, -configured column captures for a row). It never touches the seam: membership, -`Unmatched`, and `Reference` stay governed by the definition's `pattern`. -Discriminating by *path* would be a storage-type-scoped condition; it is -deferred precisely to keep the seam closed for now. - -## Coordinates are the selector - -GX's `group_names` *are* the addressing grammar: a batch is addressed by its -asset plus its captured coordinates (`{year, letter, …}`). In Katalyst, the -flat `stem` identity is the degenerate one-coordinate case; richer layouts -(`notes/2020/dune`) grow into multiple coordinates parsed from the path. The -selector grammar and the definition's pattern are two views of the same thing. - -## Design lessons (carried + corrected) - -Reuse as-is: - -- The contract is **two-way**, not one-way (discovery *and* reconstruction). -- **Configured / Inferred ≙ `check` / `infer`:** same axis, already planned. -- **Surface unmatched**, don't swallow it. -- **Coordinates = selector:** design them as one concept. - -Do better than GX did (straight from its own TODOs in the recovered code): - -- **Prefer an inherently two-way template** (`{name}_{year}.md`) over inverting - an arbitrary regex. GX inverted a capture-group regex into a `str.format` - template and the author flagged it as *"almost certainly still brittle"*, a - template is bidirectional by construction; a regex is not. -- **The pattern must own the file extension**, or reconstruction is ambiguous - when several extensions are allowed (a GX limitation noted in `util.py`). -- **Keep collection identity separate from within-collection coordinates.** - GX leaked `data_asset_name` into the coordinate map and regretted it; keep - them distinct fields. - -## What is built, and the seam left open - -- **Built:** the `internal/storage` seam (`StorageType`, `StorageInstance`, - `CollectionDefinition`, `Granularity`, `Reference`), the filesystem - collection definition (collection = directory, item = each `*.md` file, id = - stem, granularity = *file-is-item*), the SQLite collection definition - (collection = table, item = row, id = configured column, attributes = - configured column captures, granularity = *unit-is-collection*), and the - config model where an instance declares its collections. -- **Open seam:** anything that turns a path into an item identity (or back) - passes through `CollectionDefinition`, so a second backend (SQLite) can be - added later without touching the check engine, the CRUD verbs, or selector - parsing. Multi-coordinate templates, inferred mode, and non-filesystem types - slot in there. - -## Terms - -| Term | Meaning | -|---|---| -| **StorageType** | A known backend kind (filesystem, sqlite, ...). | -| **StorageInstance** | A configured instance of a StorageType plus how to reach it. | -| **CollectionDefinition** | The backend↔domain two-way mapping; yields one or more collections. | -| **Data reference** | A backend-native locator (file path, S3 key, table name). | -| **Coordinates** | The captured fields that identify a unit within its collection. | -| **Granularity** | The level (item vs. collection) at which a StorageType attaches a store's units to the domain model. | - -[addressing]: {{< relref "core-concepts.md" >}} - diff --git a/docs/content/deep-dives/vision.md b/docs/content/deep-dives/vision.md index 509f4e71..20f8bd71 100644 --- a/docs/content/deep-dives/vision.md +++ b/docs/content/deep-dives/vision.md @@ -7,8 +7,8 @@ weight = 10 Traditional data management often forces teams into binary choices: structured or unstructured, rigid or chaotic. Katalyst is an experimental -framework aimed at enabling fast, low-risk evolution through progressive -typing in the storage layer. +framework aimed at enabling fast, low-risk evolution through progressive typing +across bases and operations. ## Database management is risky and rigid @@ -88,7 +88,7 @@ These form factors share one core idea: schemas and linters are closely related and should compose across storage boundaries. The conceptual basis, why each backend tier unlocks new operations, is in [Progressive operations]({{< relref "progressive-operations.md" >}}) and the -[core concepts]({{< relref "core-concepts.md" >}}). +[domain model]({{< relref "domain-model/_index.md" >}}). ## Current implementation status diff --git a/docs/content/getting-started.md b/docs/content/getting-started.md index 49ffcfef..da618b2d 100644 --- a/docs/content/getting-started.md +++ b/docs/content/getting-started.md @@ -59,11 +59,11 @@ katalyst check - `.katalyst/config.yaml`, commented project settings - `.katalyst/schemas/`, one schema per file (empty to start) -- `.katalyst/storage/local.yaml`, the default storage instance (the local +- `.katalyst/bases/local.yaml`, the default base (the local filesystem), where you declare collections It writes no example content. Add a schema under `.katalyst/schemas/` and -declare a collection inside `.katalyst/storage/local.yaml`, then run +declare a collection inside `.katalyst/bases/local.yaml`, then run `katalyst check`. Next: diff --git a/docs/content/how-to/add-a-schema.md b/docs/content/how-to/add-a-schema.md index 54fdaaab..20258bbc 100644 --- a/docs/content/how-to/add-a-schema.md +++ b/docs/content/how-to/add-a-schema.md @@ -39,7 +39,7 @@ The shortest way is the `schema:` shorthand, which adds a single `object` check: ```yaml -# .katalyst/storage/local.yaml +# .katalyst/bases/local.yaml type: filesystem root: . collections: @@ -52,7 +52,7 @@ Equivalently, add an explicit object check to `checks`, useful when you mix it with markdown or filesystem checks: ```yaml -# .katalyst/storage/local.yaml — under collections: books: +# .katalyst/bases/local.yaml — under collections: books: path: notes/books checks: - kind: object @@ -88,7 +88,7 @@ katalyst check books --schema ./schemas/strict-book.json The precedence is `--schema` > inline `schema:` key > the collection's object check. See the [configuration reference]({{< relref "../reference/configuration.md" >}}) for the key surface, -or [Collections]({{< relref "../deep-dives/collections.md" >}}) for why. +or [Collections]({{< relref "../deep-dives/domain-model/collections.md" >}}) for why. ## See also diff --git a/docs/content/how-to/configure-rules.md b/docs/content/how-to/configure-rules.md index 7c52d42b..004195ae 100644 --- a/docs/content/how-to/configure-rules.md +++ b/docs/content/how-to/configure-rules.md @@ -10,13 +10,13 @@ them. This guide adds a collection and attaches checks to it. ## 1. Point a collection at the directory -Collections are declared inside a storage instance. In a fresh project that is -`.katalyst/storage/local.yaml` (the default filesystem instance). Add the +Collections are declared inside a base. In a fresh project that is +`.katalyst/bases/local.yaml` (the default filesystem base). Add the collection under `collections:`, keyed by its name; `path` is the directory -relative to the instance root: +relative to the base root: ```yaml -# .katalyst/storage/local.yaml +# .katalyst/bases/local.yaml type: filesystem root: . collections: @@ -34,7 +34,7 @@ the [check types reference]({{< relref "../reference/check-types/_index.md" >}}) for every check type: ```yaml -# .katalyst/storage/local.yaml +# .katalyst/bases/local.yaml type: filesystem root: . collections: diff --git a/docs/content/how-to/profile-an-existing-wiki-by-hand.md b/docs/content/how-to/profile-an-existing-wiki-by-hand.md index 033a7b87..4a8b8a4b 100644 --- a/docs/content/how-to/profile-an-existing-wiki-by-hand.md +++ b/docs/content/how-to/profile-an-existing-wiki-by-hand.md @@ -14,14 +14,14 @@ agent]({{< relref "profile-an-existing-wiki-with-an-agent.md" >}}). `inspect` reports **evidence**, counts and distributions, never recommendations. Reading the evidence and deciding the schema is your call. It -runs in **two layers**: point it at a **directory** to profile a raw store +runs in **two layers**: point it at a **directory** to profile a raw base (no project needed), or at a configured **collection** to profile its items. The onboarding loop uses both. -## 1. Survey the directory (raw-source layer) +## 1. Survey the directory (raw base layer) Point `inspect` at the directory. With no `.katalyst/` project it runs the -raw-source inspectors: +raw base inspectors: ```bash katalyst inspect ./wiki @@ -44,7 +44,7 @@ Point a collection at the directory so the field-level layer can run. Minimal config: ```yaml -# .katalyst/storage/local.yaml +# .katalyst/bases/local.yaml type: filesystem root: . collections: @@ -105,7 +105,7 @@ properties: ``` ```yaml -# .katalyst/storage/local.yaml (extend the collection from step 2) +# .katalyst/bases/local.yaml (extend the collection from step 2) type: filesystem root: . collections: diff --git a/docs/content/how-to/profile-an-existing-wiki-with-an-agent.md b/docs/content/how-to/profile-an-existing-wiki-with-an-agent.md index 65be479e..a5ae0c70 100644 --- a/docs/content/how-to/profile-an-existing-wiki-with-an-agent.md +++ b/docs/content/how-to/profile-an-existing-wiki-with-an-agent.md @@ -16,7 +16,7 @@ deciding that a field present in 94% of files should be `required`, or that a directory should be a collection, is the agent's call. Keep that division and the loop stays debuggable. -## 1. Give the agent the raw-store evidence +## 1. Give the agent the raw base evidence Run `inspect` on the directory with `--json` so the agent gets structured records: one per inspector, each carrying the unit count `n` as the @@ -26,8 +26,9 @@ denominator: katalyst inspect ./wiki --json ``` -With no project this runs the **raw-source** layer: `file_tree` maps the store -and `file_content_shape` summarizes selected-file content structure. Feed the +With no project this runs the **raw base** layer. The useful records are +`file_tree`, which shows how the directory is laid out, and +`file_content_shape`, which reports shared structure in selected files. Feed the output to the agent. Tell it the contract: every record is *evidence*, not a recommendation; it must choose its own thresholds and justify them. @@ -35,10 +36,10 @@ recommendation; it must choose its own thresholds and justify them. A capable agent then: -1. **Chooses collection boundaries** from the raw-source evidence. `file_tree` - shows the directory and naming map; `file_content_shape` shows whether an - explicit slice shares frontmatter and body conventions. The agent names the - collection and drafts `.katalyst/storage/*` pointing it at the chosen path. +1. **Drafts candidate collections** from the raw base evidence. `inspect` shows + the directory layout and the shared content structure; the agent decides which + files belong together, names the collection, and drafts `.katalyst/bases/*` + pointing each collection at its directory. 2. **Profiles the fields** by inspecting each new collection, `katalyst inspect --json` runs the collection layer, whose `object_fields` record is the per-field data dictionary (presence, types, values). diff --git a/docs/content/how-to/validate-in-ci.md b/docs/content/how-to/validate-in-ci.md index f290a1e4..979b7491 100644 --- a/docs/content/how-to/validate-in-ci.md +++ b/docs/content/how-to/validate-in-ci.md @@ -58,7 +58,7 @@ It prints one line per non-canonical item and exits 1, writing nothing. Here The `check` step enforces schema and structural checks; the `fix --check` step enforces canonical frontmatter without modifying files. See -[Frontmatter and fix]({{< relref "../deep-dives/formatting.md" >}}) for why +[Fix]({{< relref "../deep-dives/domain-model/fix.md" >}}) for why `fix` is opinionated and non-destructive in this mode. ## See also diff --git a/docs/content/reference/_index.md b/docs/content/reference/_index.md index d660b675..47ed614a 100644 --- a/docs/content/reference/_index.md +++ b/docs/content/reference/_index.md @@ -7,9 +7,11 @@ bookCollapseSection = true # Reference Information-oriented descriptions of installation methods, the configuration -surface, check types, and the project vocabulary. Reference pages describe -*what is*, not *how to:* they are looked up, not read front to back. Check-type -pages under +surface, data surfaces, check types, and the project vocabulary. Reference pages +describe *what is*, not *how to:* they are looked up, not read front to back. + +[Data surfaces]({{< relref "data-surfaces/_index.md" >}}) describe the +representations checks and inspectors read from content. Check-type pages under [Check types]({{< relref "check-types/_index.md" >}}) are generated from the checks registry, so they never drift from the code. Many of these pages carry a **Worked example**: a small input corpus, a real `katalyst` command, and its diff --git a/docs/content/reference/check-types/_index.md b/docs/content/reference/check-types/_index.md index a68e361c..76c54c2d 100644 --- a/docs/content/reference/check-types/_index.md +++ b/docs/content/reference/check-types/_index.md @@ -11,7 +11,7 @@ aliases = ["/reference/rules/"] The check types `katalyst` runs against each item, grouped by family. These pages are generated from the checks registry, so they always match the shipped binary. -## Structured object check types +## Structured object Structured-object check types validate structured frontmatter fields using schema-backed checks. @@ -24,7 +24,7 @@ Structured-object check types validate structured frontmatter fields using schem - [Unique field]({{< relref "structured-object/unique-field.md" >}}): Require that no two items share a value for a frontmatter field. - [Object validation]({{< relref "structured-object/object.md" >}}): Validate frontmatter metadata against a named JSON Schema from schemas:. -## Markdown body text check types +## Markdown body text Markdown body-text check types validate relationships between frontmatter metadata and markdown body content. @@ -36,7 +36,7 @@ Markdown body-text check types validate relationships between frontmatter metada - [Title matches H1]({{< relref "markdown-body-text/title-matches-h1.md" >}}): Require a frontmatter field to match the first H1 heading in the body. - [Writing tells]({{< relref "markdown-body-text/writing-tells.md" >}}): Warn on likely AI-writing tells (em dashes, decorative emoji, overused words, stock phrases) for human review. -## File system check types +## File system File-system check types validate filename and path conventions for items. @@ -54,7 +54,7 @@ File-system check types validate filename and path conventions for items. - [Referenced files exist]({{< relref "file-system/referenced-files-exist.md" >}}): Require path-valued frontmatter fields to resolve to real files. - [Unique filename]({{< relref "file-system/unique-filename.md" >}}): Require that no two items in the collection share a basename. -## Plain text check types +## Plain text Plain-text check types validate body content as raw text, independent of markdown structure. They apply to plain-text items as well as markdown bodies. diff --git a/docs/content/reference/check-types/file-system/_index.md b/docs/content/reference/check-types/file-system/_index.md index 4f84beb3..cea037de 100644 --- a/docs/content/reference/check-types/file-system/_index.md +++ b/docs/content/reference/check-types/file-system/_index.md @@ -1,5 +1,5 @@ +++ -title = "File system check types" +title = "File system" weight = 30 bookCollapseSection = true aliases = ["/reference/rules/file-system/"] diff --git a/docs/content/reference/check-types/markdown-body-text/_index.md b/docs/content/reference/check-types/markdown-body-text/_index.md index 4ae2a7f3..0dcc011a 100644 --- a/docs/content/reference/check-types/markdown-body-text/_index.md +++ b/docs/content/reference/check-types/markdown-body-text/_index.md @@ -1,5 +1,5 @@ +++ -title = "Markdown body text check types" +title = "Markdown body text" weight = 20 bookCollapseSection = true aliases = ["/reference/rules/markdown-body-text/"] diff --git a/docs/content/reference/check-types/plain-text/_index.md b/docs/content/reference/check-types/plain-text/_index.md index c6e774eb..46c85252 100644 --- a/docs/content/reference/check-types/plain-text/_index.md +++ b/docs/content/reference/check-types/plain-text/_index.md @@ -1,5 +1,5 @@ +++ -title = "Plain text check types" +title = "Plain text" weight = 40 bookCollapseSection = true aliases = ["/reference/rules/plain-text/"] diff --git a/docs/content/reference/check-types/structured-object/_index.md b/docs/content/reference/check-types/structured-object/_index.md index 9e3c99c5..0ff5182b 100644 --- a/docs/content/reference/check-types/structured-object/_index.md +++ b/docs/content/reference/check-types/structured-object/_index.md @@ -1,5 +1,5 @@ +++ -title = "Structured object check types" +title = "Structured object" weight = 10 bookCollapseSection = true aliases = ["/reference/rules/structured-object/"] diff --git a/docs/content/reference/configuration.md b/docs/content/reference/configuration.md index 023f9ea4..842b3bd9 100644 --- a/docs/content/reference/configuration.md +++ b/docs/content/reference/configuration.md @@ -8,9 +8,12 @@ weight = 10 Katalyst reads a `.katalyst/` directory, found by walking upward from the current working directory to the nearest ancestor that contains one. That ancestor is the repo root; all relative paths resolve against it. +Discovery resolves symlinks on both the root and the input path, because on +macOS `$TMPDIR` lives behind `/var` to `/private/var` and relative-path +resolution would otherwise produce garbage. For *why* the config is shaped this way, see [How collections -work]({{< relref "../deep-dives/collections.md" >}}). To set one up step by +work]({{< relref "../deep-dives/domain-model/collections.md" >}}). To set one up step by step, see [Configure checks for a collection]({{< relref "../how-to/configure-rules.md" >}}). @@ -21,20 +24,27 @@ collection]({{< relref "../how-to/configure-rules.md" >}}). config.yaml # optional: listing defaults and discovery settings schemas/ # one JSON Schema file per named schema book.json - storage/ # one file per storage instance - local.yaml # an instance + the collections it declares + bases/ # one file per base + local.yaml # a base + the collections it declares local/ # optional: one file per collection (escape hatch) books.yaml ``` -By default, schemas and storage instances are discovered by **convention**: +By default, schemas and bases are discovered by **convention**: every file under `schemas/` is a schema whose name is its filename stem -(`book.json` → `book`), and every file under `storage/` is a -[storage instance](#storage-instances) named for its filename stem -(`local.yaml` → `local`). `config.yaml` is optional; it carries `listing:` +(`book.json` → `book`), and every file under `bases/` is a +[base](#bases) named for its filename stem (`local.yaml` → `local`). +`config.yaml` is optional; it carries `listing:` defaults and can switch a kind to **explicit** discovery, listing definitions inline instead of as files. +`config.yaml` is YAML; schema and base files default to YAML/JSON, and the +accepted format is set per kind there. + +Legacy projects that still use `storage:` in `config.yaml` or +`.katalyst/storage/` continue to load. Do not mix legacy and new forms in the +same project; move legacy base files to `.katalyst/bases/` when you edit them. + ## Schemas Each file under `.katalyst/schemas/` is a JSON Schema. Its **name**, the @@ -45,22 +55,22 @@ collection's `schema:` shorthand. The path can move; the name should not. Schemas are stored flat; the check library that compiles a schema is determined by the referencing check type's `kind` (the `object` check uses JSON Schema). -## Storage instances +## Bases -A **storage instance** is one configured backend store, plus the collections it -maps onto the domain model. Each file under -`.katalyst/storage/` is one instance, named for its filename stem. There is no -implicit instance; `katalyst init` writes a default `local` one. +A **base** is one configured backend store, plus the collections it maps onto +the domain model. Each file under +`.katalyst/bases/` is one base, named for its filename stem. There is no +implicit base; `katalyst init` writes a default `local` one. | Key | Required | Default | Meaning | |---|---|---|---| | `type` | no | `filesystem` | Backend kind: `filesystem` or `sqlite`. | -| `root` | no | `.` | Filesystem instance root directory, relative to the repo root. Collection paths resolve against it. | -| `path` | for `sqlite` | - | SQLite database path, relative to the repo root. Alias for `root` on SQLite instances. | +| `root` | no | `.` | Base root directory, relative to the repo root. Collection paths resolve against it. | +| `path` | for `sqlite` | - | SQLite database path, relative to the repo root. Alias for `root` on SQLite bases. | | `collections` | no | - | Map of collection name → definition (see below). | ```yaml -# .katalyst/storage/local.yaml +# .katalyst/bases/local.yaml type: filesystem root: . collections: @@ -72,12 +82,12 @@ collections: ``` Collection names are unique across the whole project (selectors are -`/`, with no instance qualifier). +`/`, with no base qualifier). SQLite instances use one table per collection. Each row is one item: ```yaml -# .katalyst/storage/db.yaml +# .katalyst/bases/db.yaml type: sqlite path: content.sqlite collections: @@ -102,11 +112,11 @@ collections: ## Collections A **collection** is a directory of items plus the checks every item must pass. -Collections are declared inside their storage instance, under `collections:`. +Collections are declared inside their base, under `collections:`. | Key | Required | Default | Meaning | |---|---|---|---| -| `path` | no | the collection name | Directory, relative to the instance `root`. | +| `path` | no | the collection name | Directory, relative to the base `root`. | | `pattern` | no | `*.md` | Filename glob selecting items in the directory. | | `table` | for `sqlite` | - | SQLite table backing the collection. | | `id` | for `sqlite` | - | SQLite column that provides item identity. | @@ -141,13 +151,13 @@ inside the `author` attribute object. ### Per-collection files -An instance whose `collections:` block grows unwieldy may split collections into -one file each under `.katalyst/storage//.yaml`, named for +A base whose `collections:` block grows unwieldy may split collections into +one file each under `.katalyst/bases//.yaml`, named for its filename stem. Inline and per-file collections coexist; a name declared both inline and in a file is an error. ```yaml -# .katalyst/storage/local/books.yaml +# .katalyst/bases/local/books.yaml path: notes/books schema: book ``` @@ -242,7 +252,7 @@ failure (`matches no variant`), so every item is provably accounted for. Discrimination is by metadata only; selecting items by path or filename is not supported yet (a page type distinguishable only by location needs a frontmatter marker). `pattern` still governs collection **membership** and which files are -reported as [unmatched]({{< relref "../deep-dives/domain-model.md" >}}#invariants); +reported as [unmatched]({{< relref "../deep-dives/domain-model/_index.md" >}}#invariants); variants only route checks. ## `listing` @@ -264,7 +274,7 @@ listing: ``` ```yaml -# under a storage instance's collections: — override for one collection +# under a base's collections: override for one collection books: path: notes/books schema: book @@ -293,8 +303,8 @@ variant), even when `--schema` is used. ## See also - [Check types reference]({{< relref "check-types/_index.md" >}}), every check type. -- [Storage layer]({{< relref "../deep-dives/storage.md" >}}), the storage - instance / collection-definition model and its lineage. -- [Collections]({{< relref "../deep-dives/collections.md" >}}), the +- [Bases]({{< relref "../deep-dives/domain-model/base.md" >}}), the base / + collection-mapping model and its lineage. +- [Collections]({{< relref "../deep-dives/domain-model/collections.md" >}}), the config/collection model and rationale: schema resolution, variants, unmatched-as-error. diff --git a/docs/content/reference/data-surfaces/_index.md b/docs/content/reference/data-surfaces/_index.md new file mode 100644 index 00000000..3a27888d --- /dev/null +++ b/docs/content/reference/data-surfaces/_index.md @@ -0,0 +1,25 @@ ++++ +title = "Data surfaces" +weight = 35 +bookCollapseSection = true ++++ + +# Data surfaces + +Data surfaces are the representations Katalyst operations read from content. +Checks, inspectors, and `fix` do not all need the same surface: one check may +read structured metadata, another may scan body text, and another may inspect a +path. Naming those surfaces keeps the reference precise without turning every +representation into a codec. + +Today, Katalyst exposes four data surfaces: + +| Surface | Meaning | +|---|---| +| [File metadata]({{< relref "file-metadata.md" >}}) | Filename, extension, parent directory, path depth, and other attributes derived from the item's reference. | +| [Plain text]({{< relref "plain-text.md" >}}) | Body content read as raw text, independent of markdown structure. | +| [Markdown body text]({{< relref "markdown-body-text.md" >}}) | A parsed markdown document with optional frontmatter metadata, body bytes, source format, and source-line lookup. | +| [Structured object]({{< relref "structured-object.md" >}}) | Metadata normalized to a `map[string]any`, used by object and schema-backed checks. | + +Only Markdown body text is backed by a dedicated codec package today. The other +surfaces are projections over parsed data or derived references. diff --git a/docs/content/reference/data-surfaces/file-metadata.md b/docs/content/reference/data-surfaces/file-metadata.md new file mode 100644 index 00000000..9be2ab20 --- /dev/null +++ b/docs/content/reference/data-surfaces/file-metadata.md @@ -0,0 +1,49 @@ ++++ +title = "File metadata" +weight = 10 ++++ + +# File metadata + +File metadata is the data surface derived from an item's filesystem reference. It +does not parse item content; it reads names, extensions, parent directories, +path segments, and path depth from where the item lives. + +## Terms + +| Term | Meaning | +|---|---| +| **File metadata** | Attributes derived from an item's path or filesystem reference. | +| **Filename** | The basename of the item path. | +| **Extension** | The suffix used to classify the file's format, such as `.md` or `.txt`. | +| **Parent directory** | The directory immediately containing the item. | +| **Path depth** | The number of directory levels between the collection root and the item. | + +## Model + +File metadata belongs to the item because the item's reference can carry +meaning: a file's name may need to match a field, an extension may need to be +allowed, or a collection may require one index file per directory. + +This view backs file-system check types. It also feeds raw base inspectors such +as `file_tree`, where file names and paths help profile a base before +collections are configured. + +Unlike [Markdown body text]({{< relref "markdown-body-text.md" >}}), file +metadata is not a codec. It is derived from the reference the base already uses +to address the item. + +## Invariants + +1. **File metadata is derived from references.** It does not require reading or + parsing the item body. +2. **Path targets are explicit.** Checks choose the path slice they inspect: + filename, filename with extension, parent directory, or path segments. +3. **It is still a data surface.** Checks and inspectors can reason about path + attributes alongside structured fields and body text. + +## See also + +- [File system check types]({{< relref "../check-types/file-system/_index.md" >}}) +- [File tree inspector]({{< relref "../inspectors/source/file-tree.md" >}}) +- [File content shape inspector]({{< relref "../inspectors/source/file-content-shape.md" >}}) diff --git a/docs/content/reference/data-surfaces/markdown-body-text.md b/docs/content/reference/data-surfaces/markdown-body-text.md new file mode 100644 index 00000000..c3ad6bbc --- /dev/null +++ b/docs/content/reference/data-surfaces/markdown-body-text.md @@ -0,0 +1,86 @@ ++++ +title = "Markdown body text" +weight = 30 ++++ + +# Markdown body text + +Markdown body text is the data surface produced from a markdown-like file with +optional structured frontmatter. The codec lives in +`internal/codec/markdownbodytext`; it turns bytes on disk into structured +metadata plus body bytes that checks, inspectors, and +[`fix`]({{< relref "../../deep-dives/domain-model/fix.md" >}}) can share. + +## Terms + +| Term | Meaning | +|---|---| +| **Markdown body text** | The parsed markdown file-form exposed as a data surface: optional structured frontmatter, body bytes, source format, and source-line lookup. | +| **Frontmatter** | The structured metadata block at the top of a markdown file, in YAML, TOML, or JSON. | +| **Body** | Everything after the closing frontmatter fence. If there is no frontmatter, the whole file is the body. | +| **Document** | The in-memory representation returned by `markdownbodytext.Parse`. | +| **Metadata** | The parsed frontmatter shape, normalized to `map[string]any`. | +| **Source line map** | A JSON-pointer-path to 1-indexed source line lookup used for locatable violations. | + +## Model + +The unit of work is a file on disk with two possible regions: + +| Region | Meaning | +|---|---| +| Frontmatter | An optional structured block at the very top of the file. | +| Body | Everything after the closing frontmatter fence, or the whole file when no frontmatter is present. | + +Katalyst recognizes the three frontmatter formats emitted by Hugo, Obsidian, +and Jekyll: + +| Format | Fence | Example sources | +|---|---|---| +| YAML | `---` | Jekyll, Obsidian, Hugo | +| TOML | `+++` | Hugo, Obsidian, Jekyll | +| JSON | `{` ... `}` | Hugo | + +Whatever the source format, parsed metadata has the same shape: +`map[string]any`. Checks and inspectors can read fields without branching on +YAML, TOML, or JSON. `Document.Format` records the detected syntax so writers +can re-emit a file in its own format rather than rewriting TOML as YAML. + +When parsed, a markdown document becomes a `markdownbodytext.Document`: + +| Field | Meaning | +|---|---| +| `HasFrontmatter` | Did the file open with a recognized frontmatter fence? | +| `Format` | Detected syntax: `KindYAML`, `KindTOML`, or `KindJSON`. | +| `Meta` | Parsed frontmatter, normalized to `map[string]any`. | +| `Body` | Bytes after the closing fence, or the entire file when there is no frontmatter. | +| `BodyLine` | The 1-indexed source line where the body begins. | +| `Lines` | JSON-pointer-path to 1-indexed source line. | +| `Frontmatter` | Raw frontmatter bytes, used by text search and diagnostics. | + +The `Lines` index is what makes structured-object violations locatable. It +accounts for the opening fence offset, so `Lines["/title"] = 2` means the +`title` key is on line 2 of the original file. + +Line tracking is full for YAML only. For TOML and JSON, `Lines` is empty today; +checks degrade gracefully by emitting the violation without a line number. + +## Invariants + +1. **Readers see one metadata shape.** YAML, TOML, and JSON all parse to + `map[string]any`. +2. **Body bytes remain the body view.** The body is available to markdown and + plain-text checks without requiring callers to understand the frontmatter + syntax. +3. **Format detection is preserved for writers.** Readers expose normalized + metadata but retain the original syntax so `fix` can emit the same format. +4. **Line numbers are file-relative and 1-indexed.** The opening fence is line + 1, so the first metadata key is typically line 2 when line data is available. + +## See also + +- [Markdown body text check types]({{< relref "../check-types/markdown-body-text/_index.md" >}}) +- [Plain text]({{< relref "plain-text.md" >}}), the raw body-text surface over + the same body bytes. +- [Fix]({{< relref "../../deep-dives/domain-model/fix.md" >}}), which consumes + parsed documents when it rewrites frontmatter. +- `go doc ./internal/codec/markdownbodytext` for the code-level codec contract. diff --git a/docs/content/reference/data-surfaces/plain-text.md b/docs/content/reference/data-surfaces/plain-text.md new file mode 100644 index 00000000..7bab5025 --- /dev/null +++ b/docs/content/reference/data-surfaces/plain-text.md @@ -0,0 +1,48 @@ ++++ +title = "Plain text" +weight = 20 ++++ + +# Plain text + +Plain text is the body content read as raw text. It ignores markdown structure +and treats the selected span as text to match with regular expressions or +literal denylist entries. + +## Terms + +| Term | Meaning | +|---|---| +| **Plain text** | The body surface interpreted as raw text. | +| **Body** | The content being searched. For markdown files, this excludes frontmatter. | +| **Span** | The slice of body text a text check evaluates: the whole body, each line, the first line, or matched lines. | +| **Target** | The configured span selector for a text check. | + +## Model + +Plain-text checks run against body text, not structured metadata and not +markdown syntax trees. For markdown items, the body comes from the +[Markdown body text]({{< relref "markdown-body-text.md" >}}) view after +frontmatter has been separated. For plain-text items, the whole item body is the +text surface. + +This view backs the `text_requires`, `text_forbids`, and `text_denylist` check +types. Those checks answer content questions such as "must contain this +pattern", "must not contain this pattern", or "must not contain any of these +literal strings." + +## Invariants + +1. **Frontmatter is outside the body.** Text checks over markdown files do not + match metadata unless a check explicitly inspects raw frontmatter elsewhere. +2. **Markdown structure is not parsed.** Headings, links, and code fences are + just text to this view. +3. **The configured span controls matching.** A check may evaluate the whole + body or smaller slices such as individual lines. + +## See also + +- [Plain text check types]({{< relref "../check-types/plain-text/_index.md" >}}) +- [Markdown body text]({{< relref "markdown-body-text.md" >}}) +- [Configuration]({{< relref "../configuration.md#text-rules" >}}) for text + rule configuration. diff --git a/docs/content/reference/data-surfaces/structured-object.md b/docs/content/reference/data-surfaces/structured-object.md new file mode 100644 index 00000000..6828db30 --- /dev/null +++ b/docs/content/reference/data-surfaces/structured-object.md @@ -0,0 +1,51 @@ ++++ +title = "Structured object" +weight = 40 ++++ + +# Structured object + +Structured object is the data surface that exposes named fields and values. Today, +for filesystem markdown collections, it comes from parsed frontmatter metadata. +The domain-model term is broader on purpose: future bases may provide rows, +documents, or API resources directly as structured objects. + +## Terms + +| Term | Meaning | +|---|---| +| **Structured object** | A map-like representation of an item's structured data. | +| **Field** | A key in the structured object. A field is an attribute; not every attribute is a field. | +| **Metadata** | The parsed markdown frontmatter shape used as the structured object today. | +| **Schema directive** | The inline `schema:` key that opts an item into a named schema before validation. | + +## Model + +In the current filesystem backend, the structured-object surface is +`Document.Meta` from [Markdown body text]({{< relref "markdown-body-text.md" >}}). +It is normalized to `map[string]any` no matter whether the source frontmatter +was YAML, TOML, or JSON. + +Structured-object checks validate fields and schema-backed object shape. They +are the right fit when a check needs to ask about named values: required fields, +field type, field length, enum membership, uniqueness, sentence case, or JSON +Schema validation. + +The `schema:` directive is Katalyst metadata. It selects a configured schema for +the item and is removed before the item is validated against that schema. + +## Invariants + +1. **Field checks read normalized metadata.** They do not branch on YAML, TOML, + or JSON syntax. +2. **A field is narrower than an attribute.** Filenames and path segments are + attributes, but they are not structured-object fields. +3. **Schema selection is separate from validation.** The directive chooses the + schema; the object check validates the resulting structured object. + +## See also + +- [Structured object check types]({{< relref "../check-types/structured-object/_index.md" >}}) +- [Markdown body text]({{< relref "markdown-body-text.md" >}}) +- [Configuration]({{< relref "../configuration.md#object-schema-resolution-precedence" >}}) +- [Collections]({{< relref "../../deep-dives/domain-model/collections.md" >}}) diff --git a/docs/content/reference/glossary.md b/docs/content/reference/glossary.md index 92f0b9ae..86a6059b 100644 --- a/docs/content/reference/glossary.md +++ b/docs/content/reference/glossary.md @@ -7,45 +7,48 @@ weight = 50 The canonical vocabulary for Katalyst. Use these terms consistently in code, docs, and user-facing copy. The general, backend-agnostic vocabulary is -introduced in [core concepts]({{< relref "../deep-dives/core-concepts.md" >}}); +introduced in the [domain model]({{< relref "../deep-dives/domain-model/_index.md" >}}); how each term maps onto today's code is documented in the per-package `AGENTS.md` files under `internal/`. This page is the quick lookup. | Term | Meaning | |---|---| | **Aggregate** | The descriptive operation an inspector realizes: measuring a distribution across a collection's items rather than fetching or asserting. See **Inspector**. | -| **Attribute** | A named characteristic of an item: a frontmatter key, but also its filename, path, or extension. The general term; a key in the structured object specifically is a **Field**. | +| **Attribute** | A named characteristic of an item: a column, a frontmatter key, a response field, its filename, its path, or another backend-derived property. A key in a structured object specifically is a **Field**. | +| **Base** | One configured backend source plus the operations Katalyst can perform on its content. A raw base gives Katalyst base-native access; a collectionized base adds collection mappings. See [Bases]({{< relref "../deep-dives/domain-model/base.md" >}}). | +| **BaseInstance** | A configured instance of a BaseType plus how to reach it (for `filesystem`, a root directory). Declared under `.katalyst/bases/`; it embeds the collections it maps. | +| **BaseType** | A known backend kind capable of holding content Katalyst can operate on (`filesystem` and `sqlite` today; `postgresql`, `mongodb`, and others later). | | **Body** | Everything after the closing frontmatter fence. Preserved verbatim except by `fix`. | | **Check** | Shorthand for a check instance when context is unambiguous. | | **Check instance** | One configured check attached to a collection: a check type plus its arguments (one YAML object under `checks:`). It runs against each item (object, markdown, or filesystem family). | | **Check type** | The reusable definition of a constraint: one entry in katalyst's check registry (`object_required_field`, `markdown_single_h1`, ...), selected by its `kind:` id. `katalyst check-types list` lists them. | | **CheckLibrary** | The provider behind a check type. Native libraries (`filesystem`, `plaintext`, `markdownbodytext`, `structuredobject`) wrap hand-written checks; schema-backed libraries (`json-schema`, Vale next) compile a named schema and run items against it, and report their own availability. A library is provenance, orthogonal to the source-data family (`structuredObject`, `markdownBodyText`, `fileSystem`, `plainText`) the check reads. | -| **Collection** | A named entry in `collections:`: a directory, a filename `pattern`, and the checks its items must pass. | +| **Collection** | A group of items that share structure: a directory of similar files, a relational table, a Mongo collection, or a family of API resources. Collections own checks and are addressed by name. | | **Collection layer** | Inspectors that profile a configured collection's items, addressed by domain identity (collection + item id) and probing through the same substrate the checks use. | | **Collection-scoped check** | A check type that runs once per collection over all its items (e.g. `filesystem_unique_filename`), rather than per item. It re-scans the full collection even under a single-item selector. | -| **CollectionDefinition** | The two-way mapping from a StorageInstance's contents to collections and items. Yields one or more collections; the filesystem is the only backend today. See [storage layer]({{< relref "../deep-dives/storage.md" >}}). | -| **Config** | A **Project**'s configuration: the schemas, storage instances, and collection definitions that declare what the project contains and how its items are checked. Katalyst's config is the `.katalyst/` directory; it is loaded by the `project` package's loader (`internal/project/loader.go`). Each object type owns the parse of its own config — the storage registry validates a declared `type`, and a collection parses its own block in `storage/collection`. | +| **Collection mapping** | The two-way mapping from a base instance's contents to collections and items. Yields one or more collections; filesystem and SQLite mappings are implemented today. Implemented by `CollectionDefinition` in code. | +| **Config** | A **Project**'s configuration: the schemas, bases, and collection mappings that declare what the project contains and how its items are checked. Katalyst's config is the `.katalyst/` directory; it is loaded by the `project` package's loader (`internal/project/loader.go`). Each object type owns the parse of its own config: the base registry validates a declared `type`, and a collection parses its own block in `storage/collection`. | | **Discriminator** | The `when` predicate that selects a variant: a list of `item list --filter` expressions over an item's metadata, ANDed together. | | **Document** | The markdown file-form of an **Item**: a parsed markdown file (frontmatter metadata + body + a line map). Use it where parsing or the on-disk file is the subject; elsewhere prefer **Item**. | | **Evidence** | The structured result of one inspector: counts and distributions with the unit count `n` as denominator. Never a recommendation or verdict. | | **Field** | A key in an item's structured object (its frontmatter map). A field is an **Attribute**; a filename is an attribute but not a field. The term used wherever object or frontmatter keys are meant (`object_field_type`, `name_matches_field`). | | **Frontmatter** | The on-disk metadata block at the top of a markdown file, in YAML (`---`), TOML (`+++`), or JSON (`{ … }`). | -| **Granularity** | The level, item vs. collection, at which a StorageType attaches a store's units to the domain model (a markdown file is an item; a SQL table is a collection). | | **Inspector** | A read-only operation that measures content and returns evidence. The descriptive dual of a check: a check asserts a predicate, an inspector reports the distribution. Inspectors come in two layers. | | **Item** | The unit of data in a collection, addressed by a selector and operated on by `check`, `fix`, and the `item` subcommands. In the filesystem backend an item is one file matching the collection's pattern, its id the filename stem; its markdown file-form is a **Document**. | +| **Data surface** | A representation Katalyst exposes for checks, inspectors, or `fix` to read from content: markdown body text, plain text, structured object, or file metadata. See [Data surfaces]({{< relref "data-surfaces/_index.md" >}}). | | **Measurement primitive** | A reusable building block the inspectors are built from: `object_fields` (a data dictionary over object maps), `markdown_body` (body structure), and file-metadata. | | **Metadata** | The parsed, in-memory structure of the frontmatter (a `map[string]any`). | -| **Operation** | Something a storage backend lets you do with its data: read, list, query, aggregate, write. Each has a scope (item, collection, across collections) and structural requirements the backend must satisfy. See [progressive operations]({{< relref "../deep-dives/progressive-operations.md" >}}). | -| **Project** | The whole katalyst workspace: a repo root with a `.katalyst/` **Config** that declares the storage instances, collections, and checks katalyst operates over. The top-level scope an empty selector addresses, and what `katalyst init` creates. Collections (and the query operations scoped to them) live within a project; the `project` package (`internal/project`) is its code home — it holds the `.katalyst/` loader, and the `collection` layer lives under `storage/`. | -| **Raw-source layer** | Inspectors that profile a backend store directly, before any collection configuration, addressed by backend-native reference (a path today). The onboarding case: "what's in this store?" | +| **Operation** | Something a base lets you do with its data: read, list, query, aggregate, write. Each has a scope (item, collection, across collections) and structural requirements the backend must satisfy. See [progressive operations]({{< relref "../deep-dives/progressive-operations.md" >}}). | +| **Profile class** | A group of near-identical profiles the summarizer collapses together, so output is proportional to the number of distinct profiles, not directories. | +| **Project** | The whole katalyst workspace: a repo root with a `.katalyst/` **Config** that declares the bases, collections, and checks katalyst operates over. The top-level scope an empty selector addresses, and what `katalyst init` creates. Collections live within a project; the `project` package (`internal/project`) is its code home, holding the `.katalyst/` loader while the collection implementation lives under `storage/`. | +| **Raw base layer** | Inspectors that profile a base directly, before any collection configuration, addressed by base-native reference (a path today). The onboarding case: "what's in this base?" | | **Repo root** | The directory containing the `.katalyst/` config directory; the base for all path resolution. | | **Resolver** | The runtime object that decides which object schema applies to an item and caches compiled schemas per `(library, path)`. | | **Schema** | The definition of a collection's shape, expressed in a CheckLibrary's format (JSON Schema today; a Vale style config later). Named in `schemas:`; located by path. The katalyst concept, not the JSON Schema document specifically. | | **Schema directive** | The inline `schema:` key inside a document's frontmatter, opting it into a named schema. | | **Selector** | How a command names what to operate on: nothing (whole project), ``, or `/`. | +| **Scope** | The level an operation or backend mapping applies to: item, collection, project, or across collections. In a base, scope answers whether one matched source unit becomes an item or a collection. | | **Span** | The slice of body text a text rule is evaluated against, chosen by its `target`: the whole `body`, each `line`, the `first-line`, or `matched-lines` (lines matching a `select` regex). | -| **StorageInstance** | A configured instance of a StorageType plus how to reach it (for `filesystem`, a root directory). Declared under `.katalyst/storage/`; it embeds the collections it maps. | -| **StorageType** | A known backend kind capable of holding collections and items (`filesystem` today; `sqlite`, `postgresql`, `mongodb` later). | | **Target** | The slice of a path a filesystem name/path check type tests: `filename`, `filename-ext`, `parent-dir`, or `path-segments` (every directory segment plus the basename). For a text rule, the slice of body it tests, see Span. | | **Text rule** | A `text_*` check (`text_requires`, `text_forbids`, `text_denylist`) that tests the body as raw text, a regex or a literal denylist, independent of markdown structure. Applies to plain-text items too. | | **Validation result** | The product of running an item's checks: either `path: OK`, or a flat list of violations. | diff --git a/docs/content/reference/inspectors/_index.md b/docs/content/reference/inspectors/_index.md index ba4b3397..472ae520 100644 --- a/docs/content/reference/inspectors/_index.md +++ b/docs/content/reference/inspectors/_index.md @@ -8,11 +8,11 @@ bookCollapseSection = true # Inspectors reference -Inspectors describe the shape of content and return evidence: counts and distributions, never recommendations. They are the descriptive dual of [check types]({{< relref "../check-types/_index.md" >}}) and drive the [`inspect`]({{< relref "../cli.md" >}}) command. They come in two layers: raw-source inspectors profile a store before configuration, collection inspectors profile a configured collection. These pages are generated from the inspector registry, so they always match the shipped binary. +Inspectors describe the shape of content and return evidence: counts and distributions, never recommendations. They are the descriptive dual of [check types]({{< relref "../check-types/_index.md" >}}) and drive the [`inspect`]({{< relref "../cli.md" >}}) command. They come in two layers: raw base inspectors profile a base before configuration, collection inspectors profile a configured collection. These pages are generated from the inspector registry, so they always match the shipped binary. -## Raw-source inspectors +## Raw base inspectors -Raw-source inspectors profile a backend store directly, before any collection configuration: what files are present, how they parse, and how they are named. +Raw base inspectors profile a base directly, before any collection configuration: what files are present, how they parse, and how they are named. - [File tree]({{< relref "source/file-tree.md" >}}): Map files, directories, extensions, regions, and filename conventions, opening no files. - [File content shape]({{< relref "source/file-content-shape.md" >}}): Profile selected files by text, tabular, and tree content structure. diff --git a/docs/content/reference/inspectors/source/_index.md b/docs/content/reference/inspectors/source/_index.md index 5c9e706a..7c1c5f6d 100644 --- a/docs/content/reference/inspectors/source/_index.md +++ b/docs/content/reference/inspectors/source/_index.md @@ -1,12 +1,12 @@ +++ -title = "Raw-source inspectors" +title = "Raw base inspectors" weight = 10 bookCollapseSection = true +++ -Raw-source inspectors profile a backend store directly, before any collection configuration: what files are present, how they parse, and how they are named. +Raw base inspectors profile a base directly, before any collection configuration: what files are present, how they parse, and how they are named. Inspectors in this layer: diff --git a/docs/content/welcome.md b/docs/content/welcome.md index afa2a116..d680df20 100644 --- a/docs/content/welcome.md +++ b/docs/content/welcome.md @@ -65,7 +65,7 @@ Common updates include: - *Rules*: add or change checks. - *Content shape*: change the structure of your content. -- *Storage*: change your storage layer. +- *Bases*: change where content lives. ## Design principles diff --git a/docs/generated/examples/check-collection-rules.full.md b/docs/generated/examples/check-collection-rules.full.md index a870187d..e29f3318 100644 --- a/docs/generated/examples/check-collection-rules.full.md +++ b/docs/generated/examples/check-collection-rules.full.md @@ -20,7 +20,7 @@ title: Bad title # A different heading ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/check-schema-missing-field.full.md b/docs/generated/examples/check-schema-missing-field.full.md index c1dac4d6..f2af4eb2 100644 --- a/docs/generated/examples/check-schema-missing-field.full.md +++ b/docs/generated/examples/check-schema-missing-field.full.md @@ -21,7 +21,7 @@ title: Foundation # Foundation ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/check-title-h1-mismatch.full.md b/docs/generated/examples/check-title-h1-mismatch.full.md index a56a047b..be7825d9 100644 --- a/docs/generated/examples/check-title-h1-mismatch.full.md +++ b/docs/generated/examples/check-title-h1-mismatch.full.md @@ -11,7 +11,7 @@ title: Dune # Children of Dune ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/check-type-error.full.md b/docs/generated/examples/check-type-error.full.md index 0c509f3d..24bc60c6 100644 --- a/docs/generated/examples/check-type-error.full.md +++ b/docs/generated/examples/check-type-error.full.md @@ -12,7 +12,7 @@ year: "not a number" # Dune ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/check-valid-item.full.md b/docs/generated/examples/check-valid-item.full.md index f0aeabd5..8d0a02a0 100644 --- a/docs/generated/examples/check-valid-item.full.md +++ b/docs/generated/examples/check-valid-item.full.md @@ -12,7 +12,7 @@ year: 1965 # Dune ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/ci-check-fails.full.md b/docs/generated/examples/ci-check-fails.full.md index 125ef661..ff612a4f 100644 --- a/docs/generated/examples/ci-check-fails.full.md +++ b/docs/generated/examples/ci-check-fails.full.md @@ -20,7 +20,7 @@ title: Draft No heading here. ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/ci-fix-check.full.md b/docs/generated/examples/ci-fix-check.full.md index 8bd68d97..ed2744c9 100644 --- a/docs/generated/examples/ci-fix-check.full.md +++ b/docs/generated/examples/ci-fix-check.full.md @@ -21,7 +21,7 @@ author: Ada # Messy ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/fix-normalize-frontmatter.full.md b/docs/generated/examples/fix-normalize-frontmatter.full.md index 62295396..885766df 100644 --- a/docs/generated/examples/fix-normalize-frontmatter.full.md +++ b/docs/generated/examples/fix-normalize-frontmatter.full.md @@ -13,7 +13,7 @@ apple: 2 verbatim ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/fix-text-forbids.full.md b/docs/generated/examples/fix-text-forbids.full.md index 62a0f68d..43c86335 100644 --- a/docs/generated/examples/fix-text-forbids.full.md +++ b/docs/generated/examples/fix-text-forbids.full.md @@ -12,7 +12,7 @@ t: 1 keep this. ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/inspect-collection-fields.full.md b/docs/generated/examples/inspect-collection-fields.full.md index 6202bb21..0bbf531f 100644 --- a/docs/generated/examples/inspect-collection-fields.full.md +++ b/docs/generated/examples/inspect-collection-fields.full.md @@ -65,7 +65,7 @@ status: read # Dune Messiah ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/docs/generated/examples/inspect-source-shape.full.md b/docs/generated/examples/inspect-source-shape.full.md index 5d22bde1..f1048491 100644 --- a/docs/generated/examples/inspect-source-shape.full.md +++ b/docs/generated/examples/inspect-source-shape.full.md @@ -1,4 +1,4 @@ -Pointed at a bare directory (no project), `inspect` runs the raw-source inspectors. `file_content_shape` opens a selected slice and reports the common text, tabular, or tree structure without proposing collections. +Pointed at a bare directory (no project), `inspect` runs the raw base inspectors. `file_content_shape` opens a selected slice and reports the common text, tabular, or tree structure without proposing collections. ### Input diff --git a/docs/layouts/partials/docs/inject/toc-before.html b/docs/layouts/partials/docs/inject/toc-before.html new file mode 100644 index 00000000..a4bb15b9 --- /dev/null +++ b/docs/layouts/partials/docs/inject/toc-before.html @@ -0,0 +1,10 @@ +{{- $stale := .Params.stale -}} +{{- $staleMessage := .Params.stale_message -}} +{{- if or $stale $staleMessage -}} +
+

+ ⚠︎ This page may be incomplete, stale or inaccurate. + {{- if $staleMessage }} {{ $staleMessage }}{{- else }}{{- end }} +

+
+{{- end -}} diff --git a/docs/static/images/domain-model-core-concepts.png b/docs/static/images/domain-model-core-concepts.png new file mode 100644 index 00000000..a48196a1 Binary files /dev/null and b/docs/static/images/domain-model-core-concepts.png differ diff --git a/internal/checks/AGENTS.md b/internal/checks/AGENTS.md index 5526f6be..0b2a61b6 100644 --- a/internal/checks/AGENTS.md +++ b/internal/checks/AGENTS.md @@ -4,9 +4,9 @@ The check engine: the check types Katalyst ships, the libraries that run them, and the violations they produce. **Architecture and design rationale** - the model (check type vs. instance, -family vs. library, granularity), check libraries, how a check runs, and the +family vs. library, scope), check libraries, how a check runs, and the trade-offs - live in the -[How checks work](../../docs/content/deep-dives/checks.md) deep-dive, which is +[How checks work](../../docs/content/deep-dives/domain-model/checks.md) deep-dive, which is the source of truth. The per-type catalog is the generated [check-types reference](../../docs/content/reference/check-types/), and the code-level contract is `go doc ./internal/checks`. This file keeps only the diff --git a/internal/checks/registry.go b/internal/checks/registry.go index 8976b0de..09c32016 100644 --- a/internal/checks/registry.go +++ b/internal/checks/registry.go @@ -40,9 +40,9 @@ type Descriptor struct { // is provenance, orthogonal to Family (source-data kind). Library string `json:"library,omitempty"` // Family groups the check type by source-data kind: "structuredObject", - // "markdownBodyText", "fileSystem", or "plainText". Family and granularity - // are orthogonal, a collection-scoped check is grouped by the data it - // reads, not by its scope (e.g. unique_field is structuredObject). + // "markdownBodyText", "fileSystem", or "plainText". Family and scope are + // orthogonal, a collection-scoped check is grouped by the data it reads, + // not by its scope (e.g. unique_field is structuredObject). Family string `json:"family"` // Slug is the page basename under the family directory. Slug string `json:"slug"` @@ -81,25 +81,25 @@ func Families() []Family { { ID: "structuredObject", Slug: "structured-object", - Title: "Structured object check types", + Title: "Structured object", Intro: "Structured-object check types validate structured frontmatter fields using schema-backed checks.", }, { ID: "markdownBodyText", Slug: "markdown-body-text", - Title: "Markdown body text check types", + Title: "Markdown body text", Intro: "Markdown body-text check types validate relationships between frontmatter metadata and markdown body content.", }, { ID: "fileSystem", Slug: "file-system", - Title: "File system check types", + Title: "File system", Intro: "File-system check types validate filename and path conventions for items.", }, { ID: "plainText", Slug: "plain-text", - Title: "Plain text check types", + Title: "Plain text", Intro: "Plain-text check types validate body content as raw text, independent of markdown structure. They apply to plain-text items as well as markdown bodies.", }, } diff --git a/internal/checks/structuredobject/unique_field.go b/internal/checks/structuredobject/unique_field.go index 20eb3cb0..39a9c08d 100644 --- a/internal/checks/structuredobject/unique_field.go +++ b/internal/checks/structuredobject/unique_field.go @@ -14,8 +14,8 @@ type uniqueFieldArgs struct { // UniqueField requires that no two items share a value for Field. It is // collection-scoped (it reasons across siblings) but belongs to the -// structuredObject family because it reads a frontmatter field, granularity -// and family are orthogonal. Its kind keeps the historical filesystem_ prefix. +// structuredObject family because it reads a frontmatter field; scope and +// family are orthogonal. Its kind keeps the historical filesystem_ prefix. type UniqueField struct { Field string } diff --git a/internal/codec/markdownbodytext/encode.go b/internal/codec/markdownbodytext/encode.go index e2369ddd..3d634f8f 100644 --- a/internal/codec/markdownbodytext/encode.go +++ b/internal/codec/markdownbodytext/encode.go @@ -21,7 +21,7 @@ import ( // // It assumes doc.HasFrontmatter; the no-frontmatter passthrough is the caller's // policy (see internal/fix). Why this canonical form is intentionally inflexible -// is documented in the formatting deep-dive. +// is documented in the fix deep-dive. func Encode(doc *Document) ([]byte, error) { open, block, closeFence, err := marshalBlock(doc.Format, doc.Meta) if err != nil { diff --git a/internal/examples/AGENTS.md b/internal/examples/AGENTS.md index 8a5c68dc..e217e1fc 100644 --- a/internal/examples/AGENTS.md +++ b/internal/examples/AGENTS.md @@ -27,8 +27,8 @@ for where this fits the wider testing strategy. ## Corpus house style -- Data files first, then the storage config, then a schema file if one is kept. -- Name the storage config `.katalyst/storage/my_directory.yaml`. +- Data files first, then the base config, then a schema file if one is kept. +- Name the base config `.katalyst/bases/my_directory.yaml`. - Prefer inline `checks:` over a schema file; keep a schema only when the example is specifically about schema binding. diff --git a/internal/examples/examples.go b/internal/examples/examples.go index 9f9752a1..78d57c85 100644 --- a/internal/examples/examples.go +++ b/internal/examples/examples.go @@ -52,8 +52,8 @@ properties: year: { type: integer } ` -// notesStorage declares a single `notes` collection bound to the book schema. -const notesStorage = `type: filesystem +// notesBase declares a single `notes` collection bound to the book schema. +const notesBase = `type: filesystem root: . collections: notes: @@ -61,10 +61,10 @@ collections: schema: book ` -// notesFieldTypeStorage declares the `notes` collection with an inline +// notesFieldTypeBase declares the `notes` collection with an inline // object_field_type check instead of a schema, so the type-error example needs // no separate schema file and matches the field-type reference page it sits on. -const notesFieldTypeStorage = `type: filesystem +const notesFieldTypeBase = `type: filesystem root: . collections: notes: @@ -85,8 +85,8 @@ properties: status: { enum: [read, reading, to-read] } ` -// wikiStorage binds a `books` collection over the wiki/ tree. -const wikiStorage = `type: filesystem +// wikiBase binds a `books` collection over the wiki/ tree. +const wikiBase = `type: filesystem root: . collections: books: @@ -94,9 +94,9 @@ collections: schema: book ` -// postsRulesStorage is the `posts` collection from the configure-rules how-to: +// postsRulesBase is the `posts` collection from the configure-rules how-to: // the three structural/markdown/filesystem checks that guide attaches. -const postsRulesStorage = `type: filesystem +const postsRulesBase = `type: filesystem root: . collections: posts: @@ -122,9 +122,9 @@ properties: year: { type: integer, minimum: 0 } ` -// booksAtNotesStorage binds the `book` schema to a `books` collection at +// booksAtNotesBase binds the `book` schema to a `books` collection at // notes/books, matching the add-a-schema how-to. -const booksAtNotesStorage = `type: filesystem +const booksAtNotesBase = `type: filesystem root: . collections: books: @@ -132,10 +132,10 @@ collections: schema: book ` -// ciStorage is the small project the validate-in-ci how-to gates: a `notes` +// ciBase is the small project the validate-in-ci how-to gates: a `notes` // collection that only requires an H1, so the failing item fails on structure // alone and the canonical-frontmatter gate is easy to read. -const ciStorage = `type: filesystem +const ciBase = `type: filesystem root: . collections: notes: @@ -156,12 +156,12 @@ var wikiCorpus = []File{ } // withWikiProject appends the .katalyst project files to the wiki corpus so the -// data files lead and the storage config (then the schema) trail in the +// data files lead and the base config (then the schema) trail in the // rendered input. func withWikiProject() []File { out := append([]File{}, wikiCorpus...) return append(out, - File{Path: ".katalyst/storage/my_directory.yaml", Content: wikiStorage}, + File{Path: ".katalyst/bases/my_directory.yaml", Content: wikiBase}, File{Path: ".katalyst/schemas/book.yaml", Content: wikiBookSchema}, ) } @@ -177,7 +177,7 @@ func All() []Example { Weight: 10, Files: []File{ {Path: "notes/dune.md", Content: "---\ntitle: Dune\nyear: 1965\n---\n# Dune\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: notesStorage}, + {Path: ".katalyst/bases/my_directory.yaml", Content: notesBase}, {Path: ".katalyst/schemas/book.yaml", Content: bookSchema}, }, Args: []string{"check", "notes/dune"}, @@ -190,7 +190,7 @@ func All() []Example { Weight: 20, Files: []File{ {Path: "notes/dune.md", Content: "---\ntitle: Dune\nyear: \"not a number\"\n---\n# Dune\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: notesFieldTypeStorage}, + {Path: ".katalyst/bases/my_directory.yaml", Content: notesFieldTypeBase}, }, Args: []string{"check", "notes/dune"}, }, @@ -202,7 +202,7 @@ func All() []Example { Weight: 30, Files: []File{ {Path: "notes/dune.md", Content: "---\ntitle: Dune\n---\n# Children of Dune\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: "type: filesystem\nroot: .\ncollections:\n notes:\n path: notes\n checks:\n - kind: markdown_title_matches_h1\n field: title\n"}, + {Path: ".katalyst/bases/my_directory.yaml", Content: "type: filesystem\nroot: .\ncollections:\n notes:\n path: notes\n checks:\n - kind: markdown_title_matches_h1\n field: title\n"}, }, Args: []string{"check", "notes/dune"}, }, @@ -215,7 +215,7 @@ func All() []Example { ResultFiles: []string{"notes/doc.md"}, Files: []File{ {Path: "notes/doc.md", Content: "---\nzebra: 1\napple: 2\n---\n# Body\nverbatim\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: "type: filesystem\nroot: .\ncollections:\n notes:\n path: notes\n checks:\n - kind: markdown_requires_h1\n"}, + {Path: ".katalyst/bases/my_directory.yaml", Content: "type: filesystem\nroot: .\ncollections:\n notes:\n path: notes\n checks:\n - kind: markdown_requires_h1\n"}, }, Args: []string{"fix", "notes/doc"}, }, @@ -228,15 +228,15 @@ func All() []Example { ResultFiles: []string{"notes/doc.md"}, Files: []File{ {Path: "notes/doc.md", Content: "---\nt: 1\n---\n# Title.\nkeep this.\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: "type: filesystem\nroot: .\ncollections:\n notes:\n path: notes\n checks:\n - kind: text_forbids\n target: first-line\n pattern: '\\.(\\s*)$'\n fix: '$1'\n"}, + {Path: ".katalyst/bases/my_directory.yaml", Content: "type: filesystem\nroot: .\ncollections:\n notes:\n path: notes\n checks:\n - kind: text_forbids\n target: first-line\n pattern: '\\.(\\s*)$'\n fix: '$1'\n"}, }, Args: []string{"fix", "notes/doc"}, }, { ID: "inspect-source-shape", Title: "Profile selected raw files by content shape", - Summary: "The raw-source file_content_shape inspector profiles a selected slice of files.", - Doc: "Pointed at a bare directory (no project), `inspect` runs the raw-source inspectors. `file_content_shape` opens a selected slice and reports the common text, tabular, or tree structure without proposing collections.", + Summary: "The raw base file_content_shape inspector profiles a selected slice of files.", + Doc: "Pointed at a bare directory (no project), `inspect` runs the raw base inspectors. `file_content_shape` opens a selected slice and reports the common text, tabular, or tree structure without proposing collections.", Weight: 60, Files: wikiCorpus, Args: []string{"inspect", "./wiki", "--inspector", "file_content_shape", "--select", `ext = ".md"`}, @@ -259,7 +259,7 @@ func All() []Example { Files: []File{ {Path: "notes/books/dune.md", Content: "---\ntitle: Dune\nyear: 1965\n---\n# Dune\n"}, {Path: "notes/books/foundation.md", Content: "---\ntitle: Foundation\n---\n# Foundation\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: booksAtNotesStorage}, + {Path: ".katalyst/bases/my_directory.yaml", Content: booksAtNotesBase}, {Path: ".katalyst/schemas/book.yaml", Content: bookConstrainedSchema}, }, Args: []string{"check", "books"}, @@ -273,7 +273,7 @@ func All() []Example { Files: []File{ {Path: "content/posts/hello-world.md", Content: "---\ntitle: Hello world\n---\n# Hello world\n"}, {Path: "content/posts/Bad_Title.md", Content: "---\ntitle: Bad title\n---\n# A different heading\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: postsRulesStorage}, + {Path: ".katalyst/bases/my_directory.yaml", Content: postsRulesBase}, }, Args: []string{"check", "posts"}, }, @@ -286,7 +286,7 @@ func All() []Example { Files: []File{ {Path: "notes/intro.md", Content: "---\ntitle: Intro\n---\n# Intro\n"}, {Path: "notes/draft.md", Content: "---\ntitle: Draft\n---\nNo heading here.\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: ciStorage}, + {Path: ".katalyst/bases/my_directory.yaml", Content: ciBase}, }, Args: []string{"check"}, }, @@ -299,7 +299,7 @@ func All() []Example { Files: []File{ {Path: "notes/tidy.md", Content: "---\ntitle: Tidy\n---\n# Tidy\n"}, {Path: "notes/messy.md", Content: "---\ntitle: Messy\nauthor: Ada\n---\n# Messy\n"}, - {Path: ".katalyst/storage/my_directory.yaml", Content: ciStorage}, + {Path: ".katalyst/bases/my_directory.yaml", Content: ciBase}, }, Args: []string{"fix", "--check"}, }, diff --git a/internal/examples/testdata/check-collection-rules.md b/internal/examples/testdata/check-collection-rules.md index 864a7ee4..06b84140 100644 --- a/internal/examples/testdata/check-collection-rules.md +++ b/internal/examples/testdata/check-collection-rules.md @@ -20,7 +20,7 @@ title: Bad title # A different heading ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/check-schema-missing-field.md b/internal/examples/testdata/check-schema-missing-field.md index b4577844..297d0f97 100644 --- a/internal/examples/testdata/check-schema-missing-field.md +++ b/internal/examples/testdata/check-schema-missing-field.md @@ -21,7 +21,7 @@ title: Foundation # Foundation ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/check-title-h1-mismatch.md b/internal/examples/testdata/check-title-h1-mismatch.md index fbfda9cc..328ee6eb 100644 --- a/internal/examples/testdata/check-title-h1-mismatch.md +++ b/internal/examples/testdata/check-title-h1-mismatch.md @@ -11,7 +11,7 @@ title: Dune # Children of Dune ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/check-type-error.md b/internal/examples/testdata/check-type-error.md index 60d4f9a0..676d9529 100644 --- a/internal/examples/testdata/check-type-error.md +++ b/internal/examples/testdata/check-type-error.md @@ -12,7 +12,7 @@ year: "not a number" # Dune ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/check-valid-item.md b/internal/examples/testdata/check-valid-item.md index a53b53ff..5f20a45c 100644 --- a/internal/examples/testdata/check-valid-item.md +++ b/internal/examples/testdata/check-valid-item.md @@ -12,7 +12,7 @@ year: 1965 # Dune ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/ci-check-fails.md b/internal/examples/testdata/ci-check-fails.md index 016bc081..71d2bec3 100644 --- a/internal/examples/testdata/ci-check-fails.md +++ b/internal/examples/testdata/ci-check-fails.md @@ -20,7 +20,7 @@ title: Draft No heading here. ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/ci-fix-check.md b/internal/examples/testdata/ci-fix-check.md index 1d7170f8..8c138cfe 100644 --- a/internal/examples/testdata/ci-fix-check.md +++ b/internal/examples/testdata/ci-fix-check.md @@ -21,7 +21,7 @@ author: Ada # Messy ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/fix-normalize-frontmatter.md b/internal/examples/testdata/fix-normalize-frontmatter.md index d24a4a38..e67d1b23 100644 --- a/internal/examples/testdata/fix-normalize-frontmatter.md +++ b/internal/examples/testdata/fix-normalize-frontmatter.md @@ -13,7 +13,7 @@ apple: 2 verbatim ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/fix-text-forbids.md b/internal/examples/testdata/fix-text-forbids.md index 8d42880e..e600f7ec 100644 --- a/internal/examples/testdata/fix-text-forbids.md +++ b/internal/examples/testdata/fix-text-forbids.md @@ -12,7 +12,7 @@ t: 1 keep this. ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/inspect-collection-fields.md b/internal/examples/testdata/inspect-collection-fields.md index e3895527..7c7adb03 100644 --- a/internal/examples/testdata/inspect-collection-fields.md +++ b/internal/examples/testdata/inspect-collection-fields.md @@ -65,7 +65,7 @@ status: read # Dune Messiah ``` -`.katalyst/storage/my_directory.yaml` +`.katalyst/bases/my_directory.yaml` ```yaml type: filesystem diff --git a/internal/examples/testdata/inspect-source-shape.md b/internal/examples/testdata/inspect-source-shape.md index 2e12e50d..9222dcf9 100644 --- a/internal/examples/testdata/inspect-source-shape.md +++ b/internal/examples/testdata/inspect-source-shape.md @@ -1,4 +1,4 @@ -Pointed at a bare directory (no project), `inspect` runs the raw-source inspectors. `file_content_shape` opens a selected slice and reports the common text, tabular, or tree structure without proposing collections. +Pointed at a bare directory (no project), `inspect` runs the raw base inspectors. `file_content_shape` opens a selected slice and reports the common text, tabular, or tree structure without proposing collections. ## Input diff --git a/internal/fix/AGENTS.md b/internal/fix/AGENTS.md index 1f83e67b..8f8a34de 100644 --- a/internal/fix/AGENTS.md +++ b/internal/fix/AGENTS.md @@ -7,7 +7,7 @@ job (`storage/collection/filesystem.Write`). Why the canonical form is deliberately inflexible, and why `fix` never injects missing values, lives in the -[Frontmatter and fix](../../docs/content/deep-dives/formatting.md) deep-dive. +[Fix](../../docs/content/deep-dives/domain-model/fix.md) deep-dive. ## Conventions diff --git a/internal/inspect/AGENTS.md b/internal/inspect/AGENTS.md index e6b48ead..b9ce2599 100644 --- a/internal/inspect/AGENTS.md +++ b/internal/inspect/AGENTS.md @@ -5,7 +5,7 @@ descriptive dual of `internal/checks`. **Architecture and design rationale** - the two layers, the measurement primitives, evidence-not-recommendations, the determinism dividing line - live -in the [How inspectors work](../../docs/content/deep-dives/inspectors.md) +in the [How inspectors work](../../docs/content/deep-dives/domain-model/inspectors.md) deep-dive (also summarized in `go doc ./internal/inspect`), which is the source of truth. This file keeps only the local code conventions. diff --git a/internal/inspect/collection_test.go b/internal/inspect/collection_test.go index 77b35fd7..2940bcba 100644 --- a/internal/inspect/collection_test.go +++ b/internal/inspect/collection_test.go @@ -12,7 +12,7 @@ import ( func TestCollectionView_objectFieldsAndMarkdownBody(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{ + "bases/local.yaml": projecttest.LocalBase(map[string]string{ "notes": "path: notes\nchecks:\n - kind: markdown_requires_h1\n", }), }) diff --git a/internal/inspect/doc.go b/internal/inspect/doc.go index c4294df2..6a69e997 100644 --- a/internal/inspect/doc.go +++ b/internal/inspect/doc.go @@ -1,11 +1,12 @@ // Package inspect profiles content and returns evidence, the descriptive dual // of internal/checks: a check asserts a predicate; an inspector reports the // distribution that predicate would be tested against. Inspectors come in two -// layers (raw-source and collection) and are built from a few reusable +// layers (raw base and collection) and are built from a few reusable // measurement primitives. They report counts and distributions only, never // recommendations. // // The full architecture and design rationale (the two layers, the primitives, // evidence-not-recommendations, and the determinism dividing line) live in the -// "How inspectors work" deep-dive at docs/content/deep-dives/inspectors.md. +// "How inspectors work" deep-dive at +// docs/content/deep-dives/domain-model/inspectors.md. package inspect diff --git a/internal/inspect/filecontentshape.go b/internal/inspect/filecontentshape.go index 137eb86a..d87b4097 100644 --- a/internal/inspect/filecontentshape.go +++ b/internal/inspect/filecontentshape.go @@ -23,7 +23,7 @@ type FileContentShape struct{} func (FileContentShape) Name() string { return "file_content_shape" } -func (FileContentShape) AppliesTo(t storage.StorageType) bool { return t == storage.Filesystem } +func (FileContentShape) AppliesTo(t storage.BaseType) bool { return t == storage.Filesystem } func (FileContentShape) Inspect(v SourceView, p Params) Evidence { data := buildFileContentShape(v, p.Selection) diff --git a/internal/inspect/inspect.go b/internal/inspect/inspect.go index 75113723..55f0f633 100644 --- a/internal/inspect/inspect.go +++ b/internal/inspect/inspect.go @@ -20,21 +20,21 @@ type Evidence struct { // CollectionInspector measures a configured collection, addressed by domain // identity (Collection + Item.ID) through a CollectionView rather than by raw // path. It is the collection-layer half of the two-layer inspector model; the -// raw-source half is SourceInspector. Params carries the collapse tolerance for +// raw base half is SourceInspector. Params carries the collapse tolerance for // summarizing inspectors and is ignored by those that don't summarize. type CollectionInspector interface { Name() string Inspect(CollectionView, Params) Evidence } -// SourceInspector measures a raw backend store before any collection +// SourceInspector measures a raw base before any collection // configuration, addressed by backend-native reference (a path today) through a // SourceView. AppliesTo gates backend-specific inspectors: one returns false for -// a StorageType it cannot describe, so it is simply absent there. It is the -// raw-source half of the two-layer model; the collection half is +// a BaseType it cannot describe, so it is simply absent there. It is the +// raw base half of the two-layer model; the collection half is // CollectionInspector. type SourceInspector interface { Name() string - AppliesTo(storage.StorageType) bool + AppliesTo(storage.BaseType) bool Inspect(SourceView, Params) Evidence } diff --git a/internal/inspect/inspectors_source.go b/internal/inspect/inspectors_source.go index f41a082b..d35e4250 100644 --- a/internal/inspect/inspectors_source.go +++ b/internal/inspect/inspectors_source.go @@ -2,14 +2,14 @@ package inspect import "github.com/abegong/katalyst/internal/storage" -// FileTree is the shallow, cheap raw-source inspector: a deterministic +// FileTree is the shallow, cheap raw base inspector: a deterministic // filesystem map from path metadata. It opens no files. Filesystem-specific. // Subsumes the former filesystem_naming. type FileTree struct{} func (FileTree) Name() string { return "file_tree" } -func (FileTree) AppliesTo(t storage.StorageType) bool { return t == storage.Filesystem } +func (FileTree) AppliesTo(t storage.BaseType) bool { return t == storage.Filesystem } func (FileTree) Inspect(v SourceView, p Params) Evidence { return Evidence{Inspector: "file_tree", Scope: v.root, N: v.N(), Data: buildFileTreeSummary(v)} diff --git a/internal/inspect/registry.go b/internal/inspect/registry.go index 1ce50b5e..40cac528 100644 --- a/internal/inspect/registry.go +++ b/internal/inspect/registry.go @@ -6,9 +6,9 @@ package inspect // parity per layer. A new inspector cannot ship undocumented, mirroring the // checks registry (internal/checks/registry.go). -// Layer groups inspectors by the data they measure: a raw backend store -// (source) or a configured collection (collection). It is the primary grouping -// for display and docs. Order is significant. +// Layer groups inspectors by the data they measure: a raw base (source) or a +// configured collection (collection). It is the primary grouping for display +// and docs. Order is significant. type Layer struct { ID string Title string @@ -20,8 +20,8 @@ func Layers() []Layer { return []Layer{ { ID: "source", - Title: "Raw-source inspectors", - Intro: "Raw-source inspectors profile a backend store directly, before any collection configuration: what files are present, how they parse, and how they are named.", + Title: "Raw base inspectors", + Intro: "Raw base inspectors profile a base directly, before any collection configuration: what files are present, how they parse, and how they are named.", }, { ID: "collection", @@ -107,7 +107,7 @@ func Descriptors() []Descriptor { } } -// SourceInspectors returns every raw-source inspector instance in display order. +// SourceInspectors returns every raw base inspector instance in display order. func SourceInspectors() []SourceInspector { return []SourceInspector{ FileTree{}, diff --git a/internal/inspect/source.go b/internal/inspect/source.go index 094b18de..05d44d7f 100644 --- a/internal/inspect/source.go +++ b/internal/inspect/source.go @@ -22,11 +22,11 @@ type readCounter struct { count int } -// SourceView is the raw-source layer's addressing surface: a filesystem tree +// SourceView is the raw base layer's addressing surface: a filesystem tree // walked once into per-file metadata, addressed by backend-native reference // (the relative path). Path-level inspectors (file_tree) read only this -// metadata and open no files; content inspectors read selected files explicitly. -// Filesystem-only for now; generalizing the walk into the storage layer is +// metadata and open no files; content inspectors trigger a one-time markdown +// parse. Filesystem-only for now; generalizing the walk across base types is // future work. type SourceView struct { root string diff --git a/internal/project/AGENTS.md b/internal/project/AGENTS.md index 789af5e5..859760ad 100644 --- a/internal/project/AGENTS.md +++ b/internal/project/AGENTS.md @@ -1,21 +1,20 @@ # internal/project -The project domain layer: finds `.katalyst/`, loads schemas and storage -instances, exposes collections, resolves selectors, and enumerates concrete -items for the CLI. +The project domain layer: finds `.katalyst/`, loads schemas and bases, exposes +collections, resolves selectors, and enumerates concrete items for the CLI. Architecture and rationale live in the -[domain model](../../docs/content/deep-dives/domain-model.md), +[domain model](../../docs/content/deep-dives/domain-model/_index.md), [configuration](../../docs/content/reference/configuration.md), and -[storage](../../docs/content/deep-dives/storage.md) docs. This file keeps only +[Bases](../../docs/content/deep-dives/domain-model/storage.md) docs. This file keeps only local code conventions. ## Conventions - The loader owns the `.katalyst/` vocabulary: discovery mode, config format, - schema names, storage instance names, collection uniqueness, and selector + schema names, base names, collection uniqueness, and selector parsing. Do not duplicate that parsing in `cmd/`. -- Storage and collection details stay below the storage boundary. This package +- Base and collection details stay below the storage boundary. This package assembles `storage/collection.Collection` values and calls a `CollectionDefinition`; it should not inline globbing, path joins, or filename-as-id assumptions. diff --git a/internal/project/loader.go b/internal/project/loader.go index 3bd7cbde..db71ef1c 100644 --- a/internal/project/loader.go +++ b/internal/project/loader.go @@ -2,16 +2,16 @@ // and answers two questions: // // 1. Which schemas exist (by name → absolute file path)? -// 2. Which storage instances exist, what collections does each declare, and +// 2. Which bases exist, what collections does each declare, and // what checks does each collection run? // // A project is the nearest ancestor directory that contains a .katalyst/ // subdirectory. Schemas are defined one named file per definition under -// .katalyst/schemas/; storage instances one named file per instance under -// .katalyst/storage/ (discovery: convention, the default), or listed -// explicitly in .katalyst/config.yaml (discovery: explicit). A storage -// instance embeds the collections it maps. The file format (yaml, json, or -// both) is set per kind in config.yaml. See +// .katalyst/schemas/; bases are defined one named file per definition under +// .katalyst/bases/ (discovery: convention, the default), or listed explicitly +// in .katalyst/config.yaml (discovery: explicit). A base embeds the collections +// it maps. Legacy projects may still use storage: and .katalyst/storage/. The +// file format (yaml, json, or both) is set per kind in config.yaml. See // docs/content/reference/configuration.md. package project @@ -38,6 +38,7 @@ const configFile = "config.yaml" // Subdirectories of Dir holding one named file per definition. const ( schemasSubdir = "schemas" + basesSubdir = "bases" storageSubdir = "storage" ) @@ -62,28 +63,28 @@ type Config struct { Root string // Schemas is name → absolute path. Schemas map[string]string - // Storage holds the configured storage instances, in name order. Each - // instance declares its own collections. - Storage []StorageInstance - // Collections is the flattened view across all instances, in name order. - // Collection names are unique project-wide (selectors carry no instance + // Bases holds the configured bases, in name order. Each base declares its + // own collections. + Bases []BaseInstance + // Collections is the flattened view across all bases, in name order. + // Collection names are unique project-wide (selectors carry no base // qualifier), so this is the canonical lookup most callers use. Collections []Collection } -// StorageInstance is one configured backend store plus the collections it maps -// onto the domain model. For StorageType filesystem, Root is a directory. -type StorageInstance struct { - // Name is the public handle (filename stem under .katalyst/storage/, or - // the key in the inline `storage.defs` map). +// BaseInstance is one configured backend store plus the collections it maps +// onto the domain model. For BaseType filesystem, Root is a directory. +type BaseInstance struct { + // Name is the public handle (filename stem under .katalyst/bases/, or + // the key in the inline `bases.defs` map). Name string // Type is the backend kind, validated against the storage registry // (storage.Known). Type string - // Root is the absolute, resolved instance root. Relative roots in the + // Root is the absolute, resolved base root. Relative roots in the // source resolve against the repo Root. Root string - // Collections this instance declares, in name order. + // Collections this base declares, in name order. Collections []Collection } @@ -103,7 +104,8 @@ type ( // default YAML format. type rawConfig struct { Schemas rawSchemaKind `yaml:"schemas"` - Storage rawStorageKind `yaml:"storage"` + Bases *rawBaseKind `yaml:"bases"` + Storage *rawBaseKind `yaml:"storage"` Listing *collection.RawListingDefaults `yaml:"listing"` Query *collection.RawListingDefaults `yaml:"query"` } @@ -116,18 +118,18 @@ type rawSchemaKind struct { Defs map[string]string `yaml:"defs"` } -// rawStorageKind configures how storage instances are discovered. Defs is -// consulted only when Discovery is "explicit" (name → instance). -type rawStorageKind struct { - Discovery string `yaml:"discovery"` - Format string `yaml:"format"` - Defs map[string]rawStorageInstance `yaml:"defs"` +// rawBaseKind configures how bases are discovered. Defs is consulted only when +// Discovery is "explicit" (name → base). +type rawBaseKind struct { + Discovery string `yaml:"discovery"` + Format string `yaml:"format"` + Defs map[string]rawBaseInstance `yaml:"defs"` } -// rawStorageInstance mirrors one storage instance: its backend type, its root, -// and the collections it declares (name → definition). The collection mirror -// lives with the Collection type in internal/storage/collection. -type rawStorageInstance struct { +// rawBaseInstance mirrors one base: its backend type, its root, and the +// collections it declares (name → definition). The collection mirror lives with +// the Collection type in internal/storage/collection. +type rawBaseInstance struct { Type string `yaml:"type"` Root string `yaml:"root"` Path string `yaml:"path"` @@ -162,7 +164,7 @@ func Load(start string) (*Config, error) { if raw.Query != nil { return nil, errors.New("config: query is no longer a config block; use listing") } - if err := cfg.loadStorage(raw.Storage, raw.Listing); err != nil { + if err := cfg.loadBases(raw.Bases, raw.Storage, raw.Listing); err != nil { return nil, err } return cfg, nil @@ -216,39 +218,57 @@ func (c *Config) loadSchemas(k rawSchemaKind) error { return nil } -// loadStorage populates c.Storage and the flattened c.Collections (both sorted -// by name) from either the storage directory (convention: one file per -// instance) or an explicit defs map in config.yaml. Collection names are -// validated unique across every instance. -func (c *Config) loadStorage(k rawStorageKind, projectListing *collection.RawListingDefaults) error { +// loadBases populates c.Bases and the flattened c.Collections (both sorted by +// name) from either the bases directory (convention: one file per base) or an +// explicit defs map in config.yaml. Collection names are validated unique +// across every base. The legacy storage block and directory stay readable, but +// cannot be mixed with the new bases form. +func (c *Config) loadBases(bases, legacy *rawBaseKind, projectListing *collection.RawListingDefaults) error { + if bases != nil && legacy != nil { + return errors.New("config: use bases, not both bases and storage") + } + label := "bases" + k := rawBaseKind{} + if bases != nil { + k = *bases + } else if legacy != nil { + k = *legacy + label = "storage" + } + discovery, err := normDiscovery(k.Discovery) if err != nil { - return fmt.Errorf("storage: %w", err) + return fmt.Errorf("%s: %w", label, err) } exts, err := formatExts(k.Format) if err != nil { - return fmt.Errorf("storage: %w", err) + return fmt.Errorf("%s: %w", label, err) + } + + baseSubdir, err := c.baseSubdir(label) + if err != nil { + return err } - defs := map[string]rawStorageInstance{} + defs := map[string]rawBaseInstance{} if discovery == discoveryExplicit { if len(k.Defs) == 0 { - return errors.New(`storage: discovery "explicit" requires a non-empty "defs" map`) + return fmt.Errorf(`%s: discovery "explicit" requires a non-empty "defs" map`, label) } defs = k.Defs } else { - found, err := scanKindDir(filepath.Join(c.Root, Dir, storageSubdir), exts) + found, err := scanKindDir(filepath.Join(c.Root, Dir, baseSubdir), exts) if err != nil { - return fmt.Errorf("storage: %w", err) + return fmt.Errorf("%s: %w", label, err) } for name, path := range found { src, err := os.ReadFile(path) if err != nil { - return fmt.Errorf("storage %q: %w", name, err) + return fmt.Errorf("%s %q: %w", label, name, err) } - var ri rawStorageInstance + var ri rawBaseInstance if err := yaml.Unmarshal(src, &ri); err != nil { - return fmt.Errorf("storage %q: %w", name, err) + return fmt.Errorf("%s %q: %w", label, name, err) } defs[name] = ri } @@ -260,22 +280,22 @@ func (c *Config) loadStorage(k rawStorageKind, projectListing *collection.RawLis } sort.Strings(names) - // instanceOf records which instance first claimed a collection name, so a - // collision across instances is reported with both sides. - instanceOf := map[string]string{} + // baseOf records which base first claimed a collection name, so a collision + // across bases is reported with both sides. + baseOf := map[string]string{} for _, name := range names { - inst, err := c.buildInstance(name, defs[name], exts, projectListing) + inst, err := c.buildInstance(name, defs[name], exts, projectListing, baseSubdir, label) if err != nil { return err } for _, col := range inst.Collections { - if prev, dup := instanceOf[col.Name]; dup { - return fmt.Errorf("collection %q is declared by two storage instances (%q and %q); collection names must be unique across the project", col.Name, prev, name) + if prev, dup := baseOf[col.Name]; dup { + return fmt.Errorf("collection %q is declared by two bases (%q and %q); collection names must be unique across the project", col.Name, prev, name) } - instanceOf[col.Name] = name + baseOf[col.Name] = name c.Collections = append(c.Collections, col) } - c.Storage = append(c.Storage, inst) + c.Bases = append(c.Bases, inst) } sort.Slice(c.Collections, func(i, j int) bool { return c.Collections[i].Name < c.Collections[j].Name @@ -283,20 +303,46 @@ func (c *Config) loadStorage(k rawStorageKind, projectListing *collection.RawLis return nil } -// buildInstance turns one raw storage instance into a validated -// StorageInstance, building each of its collections against the instance root. -// Collections come from the instance's inline `collections:` block and, as an -// escape hatch for instances that outgrow inline, from one file per collection -// under .katalyst/storage//. A name declared in both places is an error. -// The instance name comes from the source (filename stem or map key), never the -// body. -func (c *Config) buildInstance(name string, ri rawStorageInstance, exts []string, projectListing *collection.RawListingDefaults) (StorageInstance, error) { +// baseSubdir chooses the directory that holds base definition files. New config +// uses .katalyst/bases/. Legacy .katalyst/storage/ remains readable, but the +// two directories cannot be mixed. +func (c *Config) baseSubdir(label string) (string, error) { + hasBases, err := dirExists(filepath.Join(c.Root, Dir, basesSubdir)) + if err != nil { + return "", fmt.Errorf("bases: %w", err) + } + hasStorage, err := dirExists(filepath.Join(c.Root, Dir, storageSubdir)) + if err != nil { + return "", fmt.Errorf("storage: %w", err) + } + if hasBases && hasStorage { + return "", errors.New("config: use .katalyst/bases, not both .katalyst/bases and .katalyst/storage") + } + if hasBases { + return basesSubdir, nil + } + if hasStorage { + return storageSubdir, nil + } + if label == "storage" { + return storageSubdir, nil + } + return basesSubdir, nil +} + +// buildInstance turns one raw base into a validated +// BaseInstance, building each of its collections against the base root. +// Collections come from the base's inline `collections:` block and, as an +// escape hatch for bases that outgrow inline, from one file per collection +// under .katalyst/bases//. A name declared in both places is an error. +// The base name comes from the source (filename stem or map key), never the body. +func (c *Config) buildInstance(name string, ri rawBaseInstance, exts []string, projectListing *collection.RawListingDefaults, baseSubdir, label string) (BaseInstance, error) { typ := ri.Type if typ == "" { typ = string(storage.Filesystem) } - if !storage.Known(storage.StorageType(typ)) { - return StorageInstance{}, fmt.Errorf("storage %q: unknown type %q", name, ri.Type) + if !storage.Known(storage.BaseType(typ)) { + return BaseInstance{}, fmt.Errorf("%s %q: unknown type %q", label, name, ri.Type) } rootRel := ri.Root @@ -313,22 +359,22 @@ func (c *Config) buildInstance(name string, ri rawStorageInstance, exts []string for cn, rc := range ri.Collections { raws[cn] = rc } - instDir := filepath.Join(c.Root, Dir, storageSubdir, name) + instDir := filepath.Join(c.Root, Dir, baseSubdir, name) found, err := scanKindDir(instDir, exts) if err != nil { - return StorageInstance{}, fmt.Errorf("storage %q: %w", name, err) + return BaseInstance{}, fmt.Errorf("%s %q: %w", label, name, err) } for cn, path := range found { if _, dup := raws[cn]; dup { - return StorageInstance{}, fmt.Errorf("storage %q: collection %q is declared both inline and in a file", name, cn) + return BaseInstance{}, fmt.Errorf("%s %q: collection %q is declared both inline and in a file", label, name, cn) } src, err := os.ReadFile(path) if err != nil { - return StorageInstance{}, fmt.Errorf("storage %q: collection %q: %w", name, cn, err) + return BaseInstance{}, fmt.Errorf("%s %q: collection %q: %w", label, name, cn, err) } var rc collection.RawCollection if err := yaml.Unmarshal(src, &rc); err != nil { - return StorageInstance{}, fmt.Errorf("storage %q: collection %q: %w", name, cn, err) + return BaseInstance{}, fmt.Errorf("%s %q: collection %q: %w", label, name, cn, err) } raws[cn] = rc } @@ -344,18 +390,29 @@ func (c *Config) buildInstance(name string, ri rawStorageInstance, exts []string col, err := collection.Build(collection.BuildInput{ Name: cn, Raw: raws[cn], - InstRoot: instRoot, - InstName: name, StorageType: typ, + BaseRoot: instRoot, + BaseName: name, ProjectListing: projectListing, SchemaKnown: c.schemaKnown, }) if err != nil { - return StorageInstance{}, err + return BaseInstance{}, err } cols = append(cols, col) } - return StorageInstance{Name: name, Type: typ, Root: instRoot, Collections: cols}, nil + return BaseInstance{Name: name, Type: typ, Root: instRoot, Collections: cols}, nil +} + +func dirExists(dir string) (bool, error) { + info, err := os.Stat(dir) + if errors.Is(err, os.ErrNotExist) { + return false, nil + } + if err != nil { + return false, err + } + return info.IsDir(), nil } // schemaKnown reports whether a schema name is defined. The collection builder diff --git a/internal/project/loader_test.go b/internal/project/loader_test.go index 53fa3a5a..1c4f679d 100644 --- a/internal/project/loader_test.go +++ b/internal/project/loader_test.go @@ -27,7 +27,7 @@ func TestLoad_convention_discoversSchemasAndCollections(t *testing.T) { projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, "schemas/person.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{ + "bases/local.yaml": projecttest.LocalBase(map[string]string{ "books": "path: notes/books\nschema: book\n", "people": "path: notes/people\npattern: \"*.markdown\"\nschema: person\n", }), @@ -49,12 +49,12 @@ func TestLoad_convention_discoversSchemasAndCollections(t *testing.T) { t.Errorf("SchemaPath(book) = %q, want %q", got, want) } - // One filesystem instance named "local". - if len(cfg.Storage) != 1 || cfg.Storage[0].Name != "local" || cfg.Storage[0].Type != "filesystem" { - t.Fatalf("expected one filesystem instance 'local', got %+v", cfg.Storage) + // One filesystem base named "local". + if len(cfg.Bases) != 1 || cfg.Bases[0].Name != "local" || cfg.Bases[0].Type != "filesystem" { + t.Fatalf("expected one filesystem base 'local', got %+v", cfg.Bases) } - if cfg.Storage[0].Root != wantRoot { - t.Errorf("instance Root = %q, want %q", cfg.Storage[0].Root, wantRoot) + if cfg.Bases[0].Root != wantRoot { + t.Errorf("base Root = %q, want %q", cfg.Bases[0].Root, wantRoot) } // Collections are flattened and sorted by name: books, people. @@ -69,8 +69,8 @@ func TestLoad_convention_discoversSchemasAndCollections(t *testing.T) { if books.Schema != "book" { t.Errorf("books.Schema = %q, want book", books.Schema) } - if books.Storage != "local" { - t.Errorf("books.Storage = %q, want local", books.Storage) + if books.Base != "local" { + t.Errorf("books.Base = %q, want local", books.Base) } if books.Pattern != "*.md" { t.Errorf("books.Pattern = %q, want default *.md", books.Pattern) @@ -94,8 +94,8 @@ func TestLoad_convention_discoversSchemasAndCollections(t *testing.T) { func TestLoad_defaultsPathToCollectionName(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "schema: book\n"}), + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), }) cfg, err := project.Load(dir) if err != nil { @@ -108,11 +108,11 @@ func TestLoad_defaultsPathToCollectionName(t *testing.T) { } func TestLoad_instanceRoot_resolvesCollectionDirs(t *testing.T) { - // A non-default instance root is the base for its collections' Dir. + // A non-default base root is the base for its collections' Dir. dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, - "storage/vault.yaml": "type: filesystem\nroot: content\ncollections:\n" + + "bases/vault.yaml": "type: filesystem\nroot: content\ncollections:\n" + " notes:\n path: notes\n schema: book\n", }) cfg, err := project.Load(dir) @@ -121,19 +121,19 @@ func TestLoad_instanceRoot_resolvesCollectionDirs(t *testing.T) { } notes, _ := cfg.Collection("notes") if want := filepath.Join(projecttest.RealPath(t, dir), "content/notes"); notes.Dir != want { - t.Errorf("notes.Dir = %q, want %q (resolved against instance root)", notes.Dir, want) + t.Errorf("notes.Dir = %q, want %q (resolved against base root)", notes.Dir, want) } } func TestLoad_perCollectionFiles_inInstanceDir(t *testing.T) { - // A collection may live in its own file under storage//, the - // escape hatch for instances that outgrow an inline block. + // A collection may live in its own file under bases//, the escape + // hatch for bases that outgrow an inline block. dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": "type: filesystem\nroot: .\ncollections: {}\n", - "storage/local/books.yaml": "path: notes/books\nschema: book\n", - "storage/local/people.yaml": "path: notes/people\nschema: book\n", + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": "type: filesystem\nroot: .\ncollections: {}\n", + "bases/local/books.yaml": "path: notes/books\nschema: book\n", + "bases/local/people.yaml": "path: notes/people\nschema: book\n", }) cfg, err := project.Load(dir) if err != nil { @@ -143,17 +143,17 @@ func TestLoad_perCollectionFiles_inInstanceDir(t *testing.T) { t.Fatalf("CollectionNames = %v, want [books people]", got) } books, _ := cfg.Collection("books") - if books.Storage != "local" { - t.Errorf("books.Storage = %q, want local", books.Storage) + if books.Base != "local" { + t.Errorf("books.Base = %q, want local", books.Base) } } func TestLoad_perCollectionFiles_coexistWithInline(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"books": "path: notes/books\nschema: book\n"}), - "storage/local/notes.yaml": "path: notes\nschema: book\n", + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"books": "path: notes/books\nschema: book\n"}), + "bases/local/notes.yaml": "path: notes\nschema: book\n", }) cfg, err := project.Load(dir) if err != nil { @@ -167,9 +167,9 @@ func TestLoad_perCollectionFiles_coexistWithInline(t *testing.T) { func TestLoad_perCollectionFiles_rejectInlineCollision(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "path: notes\nschema: book\n"}), - "storage/local/notes.yaml": "path: other\nschema: book\n", + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "path: notes\nschema: book\n"}), + "bases/local/notes.yaml": "path: other\nschema: book\n", }) _, err := project.Load(dir) if err == nil || !strings.Contains(err.Error(), "both inline and in a file") { @@ -195,9 +195,9 @@ func TestLoad_ascendsToFindProject(t *testing.T) { } } -func TestLoad_noStorage_isEmptyButValid(t *testing.T) { - // A project with schemas but no storage instances loads with zero - // collections. There is no implicit instance synthesized. +func TestLoad_noBases_isEmptyButValid(t *testing.T) { + // A project with schemas but no bases loads with zero + // collections. There is no implicit base synthesized. dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, @@ -206,8 +206,8 @@ func TestLoad_noStorage_isEmptyButValid(t *testing.T) { if err != nil { t.Fatalf("Load: %v", err) } - if len(cfg.Storage) != 0 { - t.Errorf("expected no storage instances, got %d", len(cfg.Storage)) + if len(cfg.Bases) != 0 { + t.Errorf("expected no bases, got %d", len(cfg.Bases)) } if len(cfg.Collections) != 0 { t.Errorf("expected no collections, got %d", len(cfg.Collections)) @@ -219,8 +219,8 @@ func TestLoad_noConfigFile_usesConventionDefaults(t *testing.T) { // default convention + yaml discovery. dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "schema: book\n"}), + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), }) cfg, err := project.Load(dir) if err != nil { @@ -239,10 +239,10 @@ func TestLoad_notFound(t *testing.T) { } } -func TestLoad_rejectsUnknownStorageType(t *testing.T) { +func TestLoad_rejectsUnknownBaseType(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/db.yaml": "type: postgres\ncollections:\n notes:\n path: notes\n checks:\n - kind: markdown_requires_h1\n", + "bases/db.yaml": "type: postgres\ncollections:\n notes:\n path: notes\n checks:\n - kind: markdown_requires_h1\n", }) _, err := project.Load(dir) if err == nil || !strings.Contains(err.Error(), "unknown type") { @@ -253,8 +253,8 @@ func TestLoad_rejectsUnknownStorageType(t *testing.T) { func TestLoad_rejectsDuplicateCollectionAcrossInstances(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/a.yaml": "type: filesystem\ncollections:\n notes:\n path: a\n checks:\n - kind: markdown_requires_h1\n", - "storage/b.yaml": "type: filesystem\ncollections:\n notes:\n path: b\n checks:\n - kind: markdown_requires_h1\n", + "bases/a.yaml": "type: filesystem\ncollections:\n notes:\n path: a\n checks:\n - kind: markdown_requires_h1\n", + "bases/b.yaml": "type: filesystem\ncollections:\n notes:\n path: b\n checks:\n - kind: markdown_requires_h1\n", }) _, err := project.Load(dir) if err == nil || !strings.Contains(err.Error(), "unique") { @@ -265,8 +265,8 @@ func TestLoad_rejectsDuplicateCollectionAcrossInstances(t *testing.T) { func TestLoad_rejectsUnknownSchemaInCollection(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "path: notes\nschema: nonexistent\n"}), + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "path: notes\nschema: nonexistent\n"}), }) _, err := project.Load(dir) if err == nil { @@ -280,7 +280,7 @@ func TestLoad_rejectsUnknownSchemaInCollection(t *testing.T) { func TestLoad_rejectsCollectionWithNoChecks(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "path: notes\n"}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "path: notes\n"}), }) _, err := project.Load(dir) if err == nil { @@ -306,7 +306,7 @@ func TestLoad_variantsParsed(t *testing.T) { "schemas/page.yaml": projecttest.MinimalSchema, "schemas/section.yaml": projecttest.MinimalSchema, "schemas/content.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"pages": body}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"pages": body}), }) cfg, err := project.Load(dir) @@ -358,8 +358,8 @@ func TestLoad_whenShorthandDesugars(t *testing.T) { " checks:\n" + " - kind: markdown_requires_h1\n" projecttest.WriteProject(t, dir, map[string]string{ - "schemas/page.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"pages": body}), + "schemas/page.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"pages": body}), }) cfg, err := project.Load(dir) if err != nil { @@ -386,7 +386,7 @@ func TestLoad_variantOnlyCollectionIsValid(t *testing.T) { " checks:\n" + " - kind: markdown_requires_h1\n" projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"pages": body}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"pages": body}), }) if _, err := project.Load(dir); err != nil { t.Fatalf("variant-only collection should load: %v", err) @@ -399,8 +399,8 @@ func TestLoad_rejectsInvalidVariantPredicate(t *testing.T) { "variants:\n" + " - when: \"=nofield\"\n" projecttest.WriteProject(t, dir, map[string]string{ - "schemas/page.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"pages": body}), + "schemas/page.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"pages": body}), }) _, err := project.Load(dir) if err == nil || !strings.Contains(err.Error(), "variants[0]") { @@ -415,8 +415,8 @@ func TestLoad_rejectsUnknownVariantSchema(t *testing.T) { " - when: \"kind=section\"\n" + " schema: nonexistent\n" projecttest.WriteProject(t, dir, map[string]string{ - "schemas/page.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"pages": body}), + "schemas/page.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"pages": body}), }) _, err := project.Load(dir) if err == nil || !strings.Contains(err.Error(), "nonexistent") { @@ -435,8 +435,8 @@ func TestLoad_rejectsEmptyWhen(t *testing.T) { " checks:\n" + " - kind: markdown_requires_h1\n" projecttest.WriteProject(t, dir, map[string]string{ - "schemas/page.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"pages": body}), + "schemas/page.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"pages": body}), }) _, err := project.Load(dir) if err == nil || !strings.Contains(err.Error(), "at least one predicate") { @@ -447,8 +447,8 @@ func TestLoad_rejectsEmptyWhen(t *testing.T) { func TestLoad_useExhaustiveVariantsDefaultsFalse(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "path: notes\nschema: book\n"}), + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "path: notes\nschema: book\n"}), }) cfg, err := project.Load(dir) if err != nil { @@ -467,7 +467,7 @@ func TestLoad_parsesChecks(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `path: notes + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `path: notes checks: - kind: object schema: book @@ -540,7 +540,7 @@ func TestLoad_rejectsUnknownCheckType(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `path: notes + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `path: notes checks: - kind: not-real `}), @@ -557,7 +557,7 @@ checks: func TestLoad_rejectsUnknownCheckKey(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `path: notes + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `path: notes checks: - kind: markdown_requires_h1 typo: true @@ -576,7 +576,7 @@ func TestLoad_rejectsMalformedCheckPayload(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `path: notes + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `path: notes checks: - kind: object `}), @@ -594,7 +594,7 @@ func TestLoad_rejectsObjectCheckField(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `path: notes + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `path: notes checks: - kind: object schema: book @@ -647,7 +647,7 @@ func TestLoad_rejectsInvalidFilesystemCheckConfig(t *testing.T) { t.Run(name, func(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "path: notes\nchecks:\n" + tc.checks}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "path: notes\nchecks:\n" + tc.checks}), }) _, err := project.Load(dir) if err == nil { @@ -663,7 +663,7 @@ func TestLoad_rejectsInvalidFilesystemCheckConfig(t *testing.T) { func TestLoad_parsesTextChecks(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `path: notes + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `path: notes checks: - kind: text_requires pattern: Sources @@ -746,7 +746,7 @@ func TestLoad_rejectsInvalidTextCheckConfig(t *testing.T) { t.Run(name, func(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "path: notes\nchecks:\n" + tc.checks}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "path: notes\nchecks:\n" + tc.checks}), }) _, err := project.Load(dir) if err == nil { @@ -769,7 +769,7 @@ func TestLoad_explicitDiscovery_readsDefs(t *testing.T) { discovery: explicit defs: book: ./.katalyst/my-schemas/book.yaml -storage: +bases: discovery: explicit defs: local: @@ -781,8 +781,8 @@ storage: schema: book `, // Stray files in the convention dirs must be ignored. - "schemas/ignored.yaml": projecttest.MinimalSchema, - "storage/ignored-inst.yaml": "type: filesystem\ncollections: {}\n", + "schemas/ignored.yaml": projecttest.MinimalSchema, + "bases/ignored-inst.yaml": "type: filesystem\ncollections: {}\n", }) cfg, err := project.Load(dir) if err != nil { @@ -791,9 +791,9 @@ storage: if _, ok := cfg.Schemas["ignored"]; ok { t.Errorf("explicit discovery must ignore the schemas/ dir scan") } - for _, inst := range cfg.Storage { + for _, inst := range cfg.Bases { if inst.Name != "local" { - t.Errorf("explicit discovery must ignore the storage/ dir scan, saw instance %q", inst.Name) + t.Errorf("explicit discovery must ignore the bases/ dir scan, saw base %q", inst.Name) } } wantRoot := projecttest.RealPath(t, dir) @@ -805,10 +805,75 @@ storage: } } +func TestLoad_legacyStorageBlock_readsDefs(t *testing.T) { + dir := t.TempDir() + projecttest.WriteProject(t, dir, map[string]string{ + "schemas/book.yaml": projecttest.MinimalSchema, + "config.yaml": `storage: + discovery: explicit + defs: + local: + type: filesystem + root: . + collections: + notes: + schema: book +`, + }) + cfg, err := project.Load(dir) + if err != nil { + t.Fatalf("Load: %v", err) + } + if len(cfg.Bases) != 1 || cfg.Bases[0].Name != "local" { + t.Fatalf("expected legacy storage block to load one base, got %+v", cfg.Bases) + } + if _, ok := cfg.Collection("notes"); !ok { + t.Errorf("expected notes collection from legacy storage block") + } +} + +func TestLoad_legacyStorageDir_readsConventionFiles(t *testing.T) { + dir := t.TempDir() + projecttest.WriteProject(t, dir, map[string]string{ + "schemas/book.yaml": projecttest.MinimalSchema, + "storage/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), + }) + cfg, err := project.Load(dir) + if err != nil { + t.Fatalf("Load: %v", err) + } + if len(cfg.Bases) != 1 || cfg.Bases[0].Name != "local" { + t.Fatalf("expected legacy storage dir to load one base, got %+v", cfg.Bases) + } +} + +func TestLoad_rejectsBasesAndStorageBlocks(t *testing.T) { + dir := t.TempDir() + projecttest.WriteProject(t, dir, map[string]string{ + "config.yaml": "bases:\n discovery: convention\nstorage:\n discovery: convention\n", + }) + _, err := project.Load(dir) + if err == nil || !strings.Contains(err.Error(), "both bases and storage") { + t.Fatalf("expected mixed config block error, got: %v", err) + } +} + +func TestLoad_rejectsBasesAndStorageDirs(t *testing.T) { + dir := t.TempDir() + projecttest.WriteProject(t, dir, map[string]string{ + "bases/local.yaml": "type: filesystem\ncollections: {}\n", + "storage/local.yaml": "type: filesystem\ncollections: {}\n", + }) + _, err := project.Load(dir) + if err == nil || !strings.Contains(err.Error(), ".katalyst/bases") || !strings.Contains(err.Error(), ".katalyst/storage") { + t.Fatalf("expected mixed config dir error, got: %v", err) + } +} + func TestLoad_explicitDiscovery_requiresDefs(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "config.yaml": "storage:\n discovery: explicit\n", + "config.yaml": "bases:\n discovery: explicit\n", }) _, err := project.Load(dir) if err == nil || !strings.Contains(err.Error(), "defs") { @@ -819,9 +884,9 @@ func TestLoad_explicitDiscovery_requiresDefs(t *testing.T) { func TestLoad_formatJSON_scansJSONFiles(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.json": `{"type":"object"}`, - "config.yaml": "schemas:\n format: json\n", - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "schema: book\n"}), + "schemas/book.json": `{"type":"object"}`, + "config.yaml": "schemas:\n format: json\n", + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), }) cfg, err := project.Load(dir) if err != nil { @@ -856,7 +921,7 @@ func TestLoad_perKindIndependence(t *testing.T) { defs: book: ./.katalyst/schemas/book.json `, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "schema: book\n"}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), }) cfg, err := project.Load(dir) if err != nil { @@ -884,8 +949,8 @@ func TestLoad_rejectsBadDiscovery(t *testing.T) { func TestLoad_listingDefaults_whenUnset(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "schema: book\n"}), + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), }) cfg, err := project.Load(dir) if err != nil { @@ -908,7 +973,7 @@ func TestLoad_listing_projectDefaultApplies(t *testing.T) { filterTypeMismatch: error sortMissing: lowest `, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "schema: book\n"}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), }) cfg, err := project.Load(dir) if err != nil { @@ -932,7 +997,7 @@ func TestLoad_listing_collectionOverridesPerKey(t *testing.T) { "config.yaml": `listing: sortMissing: lowest `, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `schema: book + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `schema: book listing: filterTypeMismatch: error `}), @@ -954,7 +1019,7 @@ func TestLoad_listing_rejectsUnknownValue(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `schema: book + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `schema: book listing: filterTypeMismatch: bogus `}), @@ -972,7 +1037,7 @@ func TestLoad_rejectsProjectQueryConfigBlock(t *testing.T) { "config.yaml": `query: sortMissing: lowest `, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "schema: book\n"}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), }) _, err := project.Load(dir) if err == nil || !strings.Contains(err.Error(), "query is no longer a config block; use listing") { @@ -984,7 +1049,7 @@ func TestLoad_rejectsCollectionQueryConfigBlock(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": `schema: book + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": `schema: book query: sortMissing: lowest `}), @@ -998,8 +1063,8 @@ query: func TestCollection_unknownReturnsFalse(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "schemas/book.yaml": projecttest.MinimalSchema, - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "schema: book\n"}), + "schemas/book.yaml": projecttest.MinimalSchema, + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "schema: book\n"}), }) cfg, err := project.Load(dir) if err != nil { @@ -1031,7 +1096,7 @@ func TestSchemaNames_returnsSortedNames(t *testing.T) { func TestLoad_parsesWritingTells(t *testing.T) { dir := t.TempDir() projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{"notes": "path: notes\nchecks:\n - kind: markdown_writing_tells\n"}), + "bases/local.yaml": projecttest.LocalBase(map[string]string{"notes": "path: notes\nchecks:\n - kind: markdown_writing_tells\n"}), }) cfg, err := project.Load(dir) if err != nil { diff --git a/internal/project/project.go b/internal/project/project.go index ef301a04..ecb6a404 100644 --- a/internal/project/project.go +++ b/internal/project/project.go @@ -3,8 +3,8 @@ // enumeration, and reverse id→path resolution on top of it. The path↔item- // identity mapping itself lives behind the internal/storage seam; this package // selects the right CollectionDefinition and orchestrates it. See -// docs/content/deep-dives/domain-model.md (selectors, collections, items) and -// docs/content/deep-dives/storage.md (the seam). +// docs/content/deep-dives/domain-model/_index.md (selectors, collections, items) +// and docs/content/deep-dives/domain-model/storage.md (the seam). package project import ( @@ -40,27 +40,27 @@ type ItemContent struct { Doc *markdownbodytext.Document } -func (p *Project) storageInstance(name string) (StorageInstance, bool) { - for _, inst := range p.cfg.Storage { +func (p *Project) baseInstance(name string) (BaseInstance, bool) { + for _, inst := range p.cfg.Bases { if inst.Name == name { return inst, true } } - return StorageInstance{}, false + return BaseInstance{}, false } func (p *Project) def(c Collection) (collection.CollectionDefinition, error) { - inst, ok := p.storageInstance(c.Storage) + inst, ok := p.baseInstance(c.Base) if !ok { - return nil, fmt.Errorf("collection %q: unknown storage instance %q", c.Name, c.Storage) + return nil, fmt.Errorf("collection %q: unknown base %q", c.Name, c.Base) } - switch storage.StorageType(inst.Type) { + switch storage.BaseType(inst.Type) { case storage.Filesystem: return filesystem.New(inst.Root, inst.Collections), nil case storage.SQLite: return sqlitestore.New(inst.Root, inst.Collections), nil default: - return nil, fmt.Errorf("collection %q: unsupported storage type %q", c.Name, inst.Type) + return nil, fmt.Errorf("collection %q: unsupported base type %q", c.Name, inst.Type) } } @@ -162,7 +162,7 @@ func (p *Project) Reference(c Collection, id string) (string, error) { // ReadItem reads and decodes an item through its storage backend. func (p *Project) ReadItem(item Item) (ItemContent, error) { - switch storage.StorageType(item.Collection.StorageType) { + switch storage.BaseType(item.Collection.StorageType) { case storage.SQLite: def, err := p.def(item.Collection) if err != nil { @@ -185,7 +185,7 @@ func (p *Project) ReadItem(item Item) (ItemContent, error) { // ItemExists reports whether id already exists in c. func (p *Project) ItemExists(c Collection, id string) (bool, error) { - switch storage.StorageType(c.StorageType) { + switch storage.BaseType(c.StorageType) { case storage.SQLite: def, err := p.def(c) if err != nil { @@ -204,7 +204,7 @@ func (p *Project) ItemExists(c Collection, id string) (bool, error) { // AddItem creates a new item in c. func (p *Project) AddItem(c Collection, id string, meta map[string]any, body []byte) error { - switch storage.StorageType(c.StorageType) { + switch storage.BaseType(c.StorageType) { case storage.SQLite: def, err := p.def(c) if err != nil { @@ -218,7 +218,7 @@ func (p *Project) AddItem(c Collection, id string, meta map[string]any, body []b // UpdateItem updates an existing item in c. func (p *Project) UpdateItem(c Collection, id string, meta map[string]any, body []byte) error { - switch storage.StorageType(c.StorageType) { + switch storage.BaseType(c.StorageType) { case storage.SQLite: def, err := p.def(c) if err != nil { @@ -232,7 +232,7 @@ func (p *Project) UpdateItem(c Collection, id string, meta map[string]any, body // DeleteItem deletes an existing item. func (p *Project) DeleteItem(item Item) error { - switch storage.StorageType(item.Collection.StorageType) { + switch storage.BaseType(item.Collection.StorageType) { case storage.SQLite: def, err := p.def(item.Collection) if err != nil { diff --git a/internal/project/project_test.go b/internal/project/project_test.go index e1e42d0a..f269a30e 100644 --- a/internal/project/project_test.go +++ b/internal/project/project_test.go @@ -28,7 +28,7 @@ func setup(t *testing.T) *project.Project { } projecttest.WriteProject(t, dir, map[string]string{ - "storage/local.yaml": projecttest.LocalStorage(map[string]string{ + "bases/local.yaml": projecttest.LocalBase(map[string]string{ "notes": "path: notes\nchecks:\n - kind: markdown_requires_h1\n", "people": "path: people\nchecks:\n - kind: markdown_requires_h1\n", }), diff --git a/internal/project/projecttest/projecttest.go b/internal/project/projecttest/projecttest.go index 1f669500..5e9d004e 100644 --- a/internal/project/projecttest/projecttest.go +++ b/internal/project/projecttest/projecttest.go @@ -32,9 +32,9 @@ func WriteProject(t *testing.T, dir string, files map[string]string) { } } -// LocalStorage builds a .katalyst/storage/local.yaml body with a filesystem -// instance rooted at the project and the given collection YAML bodies. -func LocalStorage(collections map[string]string) string { +// LocalBase builds a .katalyst/bases/local.yaml body with a filesystem base +// rooted at the project and the given collection YAML bodies. +func LocalBase(collections map[string]string) string { var b strings.Builder b.WriteString("type: filesystem\nroot: .\ncollections:\n") names := make([]string, 0, len(collections)) diff --git a/internal/project/selector.go b/internal/project/selector.go index 3fd082d0..c6f9a9ee 100644 --- a/internal/project/selector.go +++ b/internal/project/selector.go @@ -11,7 +11,7 @@ type UsageError struct{ Msg string } func (e *UsageError) Error() string { return e.Msg } -// Selector identifies a target by depth (see docs/content/deep-dives/domain-model.md): +// Selector identifies a target by depth (see docs/content/deep-dives/domain-model/_index.md): // // → one collection (Item == "") // / → one item diff --git a/internal/storage/AGENTS.md b/internal/storage/AGENTS.md index 4d3460eb..03e399a5 100644 --- a/internal/storage/AGENTS.md +++ b/internal/storage/AGENTS.md @@ -1,11 +1,11 @@ # internal/storage -The backend boundary. This package names storage backend kinds and keeps the +The backend boundary. This package names base backend kinds and keeps the small registry of implemented backends; `collection/` holds the mapping from a backend store to Katalyst collections and items. Architecture and rationale live in the -[storage deep-dive](../../docs/content/deep-dives/storage.md). The collection +[Bases deep-dive](../../docs/content/deep-dives/domain-model/storage.md). The collection read stack has its own local guide in [`collection/AGENTS.md`](collection/AGENTS.md). @@ -13,10 +13,10 @@ read stack has its own local guide in - Add a backend kind here only when its `CollectionDefinition` implementation exists. `Known` is the source of truth the project loader uses to validate - configured storage types. `filesystem` and `sqlite` are implemented. + configured base types. `filesystem` and `sqlite` are implemented. - `Reference` is opaque. Treat it as a backend-native locator, not always a filesystem path; filesystem interpretation belongs in `collection/filesystem`. -- Granularity is a property of the storage type, not user configuration. Keep - that decision in code so collection/item roles stay portable across backends. +- Scope is a property of the base type, not user configuration. Keep that + decision in code so collection/item roles stay portable across backends. - Keep this package small and dependency-light. Backend-specific parsing, discovery, IO, and persistence belong under `collection//`. diff --git a/internal/storage/collection/AGENTS.md b/internal/storage/collection/AGENTS.md index aa1265f0..1a8e80d3 100644 --- a/internal/storage/collection/AGENTS.md +++ b/internal/storage/collection/AGENTS.md @@ -12,8 +12,8 @@ decode and encode with; `predicate` is the attribute/object predicate grammar; Architecture and rationale — why a collection owns the read, why items are thin, and how a backend attaches — live in the -[storage layer](../../../docs/content/deep-dives/storage.md) and -[collections](../../../docs/content/deep-dives/collections.md) deep-dives. +[Bases](../../../docs/content/deep-dives/domain-model/storage.md) and +[collections](../../../docs/content/deep-dives/domain-model/collections.md) deep-dives. ## Conventions diff --git a/internal/storage/collection/collection.go b/internal/storage/collection/collection.go index c4ea0ab8..30a43f76 100644 --- a/internal/storage/collection/collection.go +++ b/internal/storage/collection/collection.go @@ -29,8 +29,8 @@ type Item struct { // (Collections, Items, Unmatched); the reverse direction reconstructs a backend // locator from an item identity (Reference). Both directions are mandatory. type CollectionDefinition interface { - // Granularity reports how this backend's units attach to the model. - Granularity() storage.Granularity + // Scope reports the scope where this backend's units attach to the model. + Scope() storage.Scope // Collections returns the collections this definition maps. One definition // may yield more than one collection. diff --git a/internal/storage/collection/filesystem/collection.go b/internal/storage/collection/filesystem/collection.go index 450646b5..6bffecbf 100644 --- a/internal/storage/collection/filesystem/collection.go +++ b/internal/storage/collection/filesystem/collection.go @@ -19,12 +19,12 @@ import ( // Definition maps a directory tree onto collections of markdown files: one file // is one item, its id is the filename stem. It is the CollectionDefinition for -// StorageType filesystem. +// BaseType filesystem. // // The per-collection methods operate on the absolute Dir already resolved on // each collection.Collection, so root is unused today; it is retained because a -// filesystem instance is identified by its root and Phase 2's BuildInstance -// resolves collection directories against it. +// filesystem base is identified by its root and the project loader resolves +// collection directories against it. type Definition struct { root string collections []collection.Collection @@ -35,8 +35,8 @@ func New(root string, collections []collection.Collection) *Definition { return &Definition{root: root, collections: collections} } -// Granularity is FileIsItem for the markdown filesystem. -func (f *Definition) Granularity() storage.Granularity { return storage.FileIsItem } +// Scope reports item scope for the markdown filesystem. +func (f *Definition) Scope() storage.Scope { return storage.FileIsItem } // Collections returns the collections this definition maps. func (f *Definition) Collections() []collection.Collection { return f.collections } diff --git a/internal/storage/collection/filesystem/collection_test.go b/internal/storage/collection/filesystem/collection_test.go index f50d3fe9..858537d6 100644 --- a/internal/storage/collection/filesystem/collection_test.go +++ b/internal/storage/collection/filesystem/collection_test.go @@ -89,9 +89,9 @@ func TestFilesystem_Reference_reverseResolution(t *testing.T) { } } -func TestFilesystem_Granularity_fileIsItem(t *testing.T) { - if g := filesystem.New("", nil).Granularity(); g != storage.FileIsItem { - t.Fatalf("Granularity = %v, want FileIsItem", g) +func TestFilesystem_Scope_fileIsItem(t *testing.T) { + if g := filesystem.New("", nil).Scope(); g != storage.FileIsItem { + t.Fatalf("Scope = %v, want FileIsItem", g) } } diff --git a/internal/storage/collection/parse.go b/internal/storage/collection/parse.go index 1f21133d..92e15721 100644 --- a/internal/storage/collection/parse.go +++ b/internal/storage/collection/parse.go @@ -41,9 +41,9 @@ type Collection struct { // ListingDefaults holds the resolved `item list` behavior for this // collection (collection config over project config over defaults). ListingDefaults ListingDefaults - // Storage is the name of the storage instance that declares this + // Base is the name of the base that declares this // collection. - Storage string + Base string // Variants are discriminated check groups: an item runs the first // variant (in order) whose Where predicates it all satisfies, in // addition to the base Checks. Empty for a collection without variants. @@ -132,7 +132,7 @@ const ( ) // RawCollection mirrors one collection definition in YAML. The loader -// unmarshals it (inline under a storage instance, or one file per collection) +// unmarshals it (inline under a base, or one file per collection) // and hands it to Build. type RawCollection struct { Path string `yaml:"path"` @@ -269,21 +269,21 @@ func (rc *RawCheck) UnmarshalYAML(value *yaml.Node) error { } // BuildInput carries everything Build needs to validate and resolve one -// collection: its raw definition and name, the owning storage instance's root +// collection: its raw definition and name, the owning base's root // and name, the project-level listing defaults, and a predicate that reports // whether a schema name is defined (schema resolution belongs to the loader). type BuildInput struct { Name string Raw RawCollection - InstRoot string - InstName string StorageType string + BaseRoot string + BaseName string ProjectListing *RawListingDefaults SchemaKnown func(string) bool } // Build turns one raw collection definition into a validated Collection, -// resolving its directory against the owning instance's root. The name comes +// resolving its directory against the owning base's root. The name comes // from the source (map key), never the file body. func Build(in BuildInput) (Collection, error) { storageType := in.StorageType @@ -355,7 +355,7 @@ func Build(in BuildInput) (Collection, error) { return Collection{ Name: in.Name, Path: dirRel, - Dir: resolveDir(in.InstRoot, dirRel), + Dir: resolveDir(in.BaseRoot, dirRel), StorageType: storageType, Table: in.Raw.Table, IDColumn: in.Raw.ID, @@ -366,7 +366,7 @@ func Build(in BuildInput) (Collection, error) { Schema: schemaName, Checks: cks, ListingDefaults: ld, - Storage: in.InstName, + Base: in.BaseName, Variants: variants, UseExhaustiveVariants: in.Raw.UseExhaustiveVariants, }, nil diff --git a/internal/storage/collection/sqlite/collection.go b/internal/storage/collection/sqlite/collection.go index 50f456f0..24228db1 100644 --- a/internal/storage/collection/sqlite/collection.go +++ b/internal/storage/collection/sqlite/collection.go @@ -32,8 +32,8 @@ func New(path string, collections []collection.Collection) *Definition { return &Definition{path: path, collections: collections} } -// Granularity is UnitIsCollection: one table is a collection and rows are items. -func (d *Definition) Granularity() storage.Granularity { return storage.UnitIsCollection } +// Scope reports collection scope for SQLite tables. +func (d *Definition) Scope() storage.Scope { return storage.UnitIsCollection } // Collections returns the collections this definition maps. func (d *Definition) Collections() []collection.Collection { return d.collections } diff --git a/internal/storage/collection/sqlite/collection_test.go b/internal/storage/collection/sqlite/collection_test.go index 93bfe9ca..bf6a269b 100644 --- a/internal/storage/collection/sqlite/collection_test.go +++ b/internal/storage/collection/sqlite/collection_test.go @@ -193,9 +193,9 @@ func TestDefinition_Read_validatesConfiguredColumns(t *testing.T) { } } -func TestDefinition_Granularity_unitIsCollection(t *testing.T) { - if g := sqlitestore.New("", nil).Granularity(); g != storage.UnitIsCollection { - t.Fatalf("Granularity = %v, want UnitIsCollection", g) +func TestDefinition_Scope_unitIsCollection(t *testing.T) { + if g := sqlitestore.New("", nil).Scope(); g != storage.UnitIsCollection { + t.Fatalf("Scope = %v, want UnitIsCollection", g) } } diff --git a/internal/storage/doc.go b/internal/storage/doc.go index 961a40b8..b4e95ee7 100644 --- a/internal/storage/doc.go +++ b/internal/storage/doc.go @@ -3,12 +3,13 @@ // // # Three concepts // -// - StorageType: a known backend kind (filesystem today; sqlite, postgresql, +// - BaseType: a known backend kind (filesystem and sqlite today; postgresql, // mongodb later). The registry here is the extension point. -// - StorageInstance (assembled by the internal/project loader): one configured +// - BaseInstance (assembled by the internal/project loader): one configured // store of a type plus how to reach it, embedding the collections it maps. // - CollectionDefinition: the two-way mapping from a store's contents to -// collections and items. FilesystemCollectionDefinition is the first. +// collections and items. Filesystem and SQLite definitions are implemented +// today. // // # The two-way contract // @@ -20,19 +21,10 @@ // filename stem) and Reference is Join(dir, id+ext); richer layouts grow into // multi-coordinate templates. // -// # Granularity +// # Scope // // Whether a matched store unit becomes an Item or a Collection is a property of -// the StorageType (Granularity), not user configuration. A markdown file is an -// Item (FileIsItem); a SQL table would be a Collection (UnitIsCollection). Item -// and Collection are therefore roles, not file counts. -// -// # Lineage -// -// The design adapts Great Expectations' V3 DataConnector layer: its Datasource -// vs. DataConnector split is this package's StorageInstance vs. -// CollectionDefinition split. Corrections carried from GX's own TODOs: prefer a -// two-way template over inverting a regex, let the pattern own the file -// extension, and keep collection identity separate from within-collection -// coordinates. See docs/content/deep-dives/storage.md. +// the BaseType, not user configuration. A markdown file is an Item; a SQL +// table is a Collection. Item and Collection are therefore roles, not file +// counts. See docs/content/deep-dives/domain-model/storage.md. package storage diff --git a/internal/storage/storage.go b/internal/storage/storage.go index 10cc9cdd..ac54df55 100644 --- a/internal/storage/storage.go +++ b/internal/storage/storage.go @@ -1,39 +1,39 @@ package storage -// StorageType is a known backend kind capable of holding collections and items. -type StorageType string +// BaseType is a known backend kind capable of holding collections and items. +type BaseType string const ( // Filesystem stores each item as one file. - Filesystem StorageType = "filesystem" + Filesystem BaseType = "filesystem" // SQLite stores each collection in one table, with each row as one item. - SQLite StorageType = "sqlite" + SQLite BaseType = "sqlite" ) // registered is the set of backend kinds with an implementation. It is the -// extension point: a new StorageType is added here when its +// extension point: a new BaseType is added here when its // CollectionDefinition lands. -var registered = map[StorageType]bool{ +var registered = map[BaseType]bool{ Filesystem: true, SQLite: true, } -// Known reports whether a StorageType has an implementation. The project loader +// Known reports whether a BaseType has an implementation. The project loader // carries the type as a plain string and leaves this validation to the storage // layer, so the storage registry remains the source of truth for backend kinds. -func Known(t StorageType) bool { return registered[t] } +func Known(t BaseType) bool { return registered[t] } -// Granularity is the level at which a backend's matched units attach to the -// domain model. It is a property of the StorageType, not user configuration: a -// markdown filesystem makes each file an Item, while a tabular backend would -// make each table a Collection and each row an Item. -type Granularity int +// Scope records the scope at which a backend's matched units attach to +// the domain model. It is a property of the BaseType, not user +// configuration: a markdown filesystem makes each file an Item, while a tabular +// backend makes each table a Collection and each row an Item. +type Scope int const ( // FileIsItem: one file is one Item; a directory of files is a Collection. - FileIsItem Granularity = iota + FileIsItem Scope = iota // UnitIsCollection: one store unit (a table/file) is a Collection; its - // rows are Items. Reserved for future tabular backends. + // rows are Items. UnitIsCollection ) diff --git a/product/specs/codec-layer-spec.md b/product/specs/codec-layer-spec.md index 717f363a..04907792 100644 --- a/product/specs/codec-layer-spec.md +++ b/product/specs/codec-layer-spec.md @@ -240,12 +240,12 @@ Update package comments to describe the codec role, not storage placement. lives under collection; point readers to `internal/codec/markdownbodytext`. - `internal/checks/AGENTS.md`: note that check contexts use codec-owned content shapes. -- `docs/content/deep-dives/formatting.md`: update package references from - `internal/storage/collection/document` to - `internal/codec/markdownbodytext`. -- `docs/content/deep-dives/storage.md`: mention that storage readers use codecs +- `docs/content/deep-dives/domain-model/frontmatter.md` and + `docs/content/deep-dives/domain-model/fix.md`: update package references from + `internal/storage/collection/document` to `internal/codec/markdownbodytext`. +- `docs/content/deep-dives/domain-model/storage.md`: mention that storage readers use codecs for content decoding, but codecs are not storage backends. -- `docs/content/deep-dives/collections.md`: update any reference that treats the +- `docs/content/deep-dives/domain-model/collections.md`: update any reference that treats the markdown codec as a collection subpackage. - `product/specs/collection-reorg-spec.md`: add a short supersession note or leave it as historical context and reference this spec from the implementation diff --git a/product/specs/collection-reorg-plan.md b/product/specs/collection-reorg-plan.md index 83e04eb5..b9f937fc 100644 --- a/product/specs/collection-reorg-plan.md +++ b/product/specs/collection-reorg-plan.md @@ -73,7 +73,8 @@ temporarily, for Phase 5 to claim. **Green check.** ### Phase 6 — Docs and final sweep Root `AGENTS.md` layout tree; new `AGENTS.md` for `storage/collection` and -`internal/fix`; delete `frontmatter/AGENTS.md`; update `formatting.md`, +`internal/fix`; delete `frontmatter/AGENTS.md`; update `frontmatter.md`, +`fix.md`, `storage.md`, `collections.md`; refresh the terminology matrix's Internal-code column. Confirm `make all` + `make docs-gen-check`. (Glossary needs no new terms.) diff --git a/product/specs/collection-reorg-spec.md b/product/specs/collection-reorg-spec.md index ab4fb873..e731ce82 100644 --- a/product/specs/collection-reorg-spec.md +++ b/product/specs/collection-reorg-spec.md @@ -213,12 +213,13 @@ _None._ Both prior questions are resolved: shape and `internal/fix`; remove the `internal/frontmatter` and `internal/project/collection` lines. - **`internal/frontmatter/AGENTS.md`** — deleted; new `AGENTS.md` files for - `storage/collection/` and `internal/fix` pointing at the formatting deep-dive's - document and fix sections. -- **`docs/content/deep-dives/formatting.md`** ("Frontmatter and fix") — update - the "parsing and formatting live in `internal/frontmatter`" line to the + `storage/collection/` and `internal/fix` pointing at the frontmatter and fix + deep dives. +- **`docs/content/deep-dives/domain-model/frontmatter.md`** and + **`docs/content/deep-dives/domain-model/fix.md`** — update the "parsing and + formatting live in `internal/frontmatter`" line to the `collection/document` + `internal/fix` split. -- **`docs/content/deep-dives/storage.md`, `collections.md`** — align to the new +- **`docs/content/deep-dives/domain-model/storage.md`, `collections.md`** — align to the new module homes (storage = backend registry; collection = read stack). - **`docs/content/reference/glossary.md`** — confirm Document, Item, and the (existing) fix wording point at the new packages; no new terms. diff --git a/product/specs/config-distribution-plan.md b/product/specs/config-distribution-plan.md index 4eb6a23e..4b3164a7 100644 --- a/product/specs/config-distribution-plan.md +++ b/product/specs/config-distribution-plan.md @@ -198,7 +198,7 @@ loader in Phase 6 rather than moving twice. a check is a kind constant plus one family file (args + parse + build + Descriptor + registration), no central-config step. Its getting shorter is the proof the change worked. -4. **Files:** `docs/content/deep-dives/collections.md` + +4. **Files:** `docs/content/deep-dives/domain-model/collections.md` + `docs/content/reference/glossary.md`. Updated to "the project loader" and the object-owns-its-config model. 5. **File:** `product/specs/domain-model-terminology-matrix.md`. Updated the diff --git a/product/specs/dogfood-docs-spec.md b/product/specs/dogfood-docs-spec.md index 93836bd7..f18fa699 100644 --- a/product/specs/dogfood-docs-spec.md +++ b/product/specs/dogfood-docs-spec.md @@ -91,10 +91,10 @@ A survey of `docs/content/` found: `docs/content/explanation/general-model.md` and `.../technical-spec.md`, neither of which exists. - **domain-model vs core-concepts overlap.** `explanation/domain-model.md` - (katalyst-specific architecture) and `deep-dives/core-concepts.md` (general + (katalyst-specific architecture) and `deep-dives/domain-model/_index.md` (general theory, marked *work in progress*) cover adjacent ground; #29 calls for one canonical home per topic. -- **WIP / future pages shipping unmarked-as-draft.** `deep-dives/core-concepts.md` +- **WIP / future pages shipping unmarked-as-draft.** `deep-dives/domain-model/_index.md` ("work in progress") and `deep-dives/connectors.md` ("future — not shipped") ship in the build with status only in body prose. diff --git a/product/specs/domain-model-cleanup-plan.md b/product/specs/domain-model-cleanup-plan.md index 9e95f814..8143133d 100644 --- a/product/specs/domain-model-cleanup-plan.md +++ b/product/specs/domain-model-cleanup-plan.md @@ -12,17 +12,17 @@ code already matches the settled terms (`frontmatter.Document`, `field`, - `docs/content/reference/glossary.md` — meant to be canonical but missing *attribute*, *field*, *operation*, *aggregate*, *validation result*, and a uses "Raw-source layer." Family is intentionally not a standalone term. -- `docs/content/deep-dives/core-concepts.md` (170 lines) — encyclopedic: a "Data +- `docs/content/deep-dives/domain-model/_index.md` (170 lines) — encyclopedic: a "Data interface" concept, a structured/unstructured "Goal," full definitions of item/collection/attribute/check/inspector, and an "Implications" section that states the operations thesis. Weight `20`. - `docs/content/deep-dives/progressive-operations.md` — the tiered model; lede "How data interfaces evolve." Weight `30`. -- `docs/content/deep-dives/domain-model.md` (112 lines) — the katalyst hub #73 +- `docs/content/deep-dives/domain-model/_index.md` (112 lines) — the katalyst hub #73 built; indexes the subsystem pages. Uses "Markdown document." -- `docs/content/deep-dives/collections.md`, `inspectors.md` — new in #73; own the +- `docs/content/deep-dives/domain-model/collections.md`, `inspectors.md` — new in #73; own the detail. `inspectors.md` uses "raw-source layer." -- `docs/content/deep-dives/storage.md` — calls itself the realization of "the +- `docs/content/deep-dives/domain-model/storage.md` — calls itself the realization of "the data interface concept." - `docs/content/deep-dives/command-organization.md` ("How the core commands are organized") — referenced only from `cmd/AGENTS.md:9` and @@ -75,15 +75,15 @@ pairs and their relationship explicit. **Goal:** core-concepts becomes a general-altitude hub mirroring domain-model; the operations thesis moves to where it is demonstrated. -1. **File:** `docs/content/deep-dives/core-concepts.md` — delete the "Goal" +1. **File:** `docs/content/deep-dives/domain-model/_index.md` — delete the "Goal" structured/unstructured dichotomy (duplicates `vision.md`); keep one sentence motivating a shared cross-backend vocabulary. -2. **File:** `docs/content/deep-dives/core-concepts.md` — collapse each concept +2. **File:** `docs/content/deep-dives/domain-model/_index.md` — collapse each concept (item, collection, attribute, operation, check, inspector) to a one-line intro that links to the glossary (definition) and to its discussion page; rename the "Data interface" concept to **Storage** and drop "data interface" from the prose and the examples table header. -3. **File:** `docs/content/deep-dives/core-concepts.md` — fix the attribute/field +3. **File:** `docs/content/deep-dives/domain-model/_index.md` — fix the attribute/field synonym line (currently "a named characteristic or field") to state the specialization instead; remove the "Implications" section, leaving a one-line pointer to progressive-operations. @@ -92,30 +92,30 @@ the operations thesis moves to where it is demonstrated. supported; checks are the means) as the page's opening thesis; reword the "data interfaces evolve" lede to "storage backends." 5. **File:** `docs/content/deep-dives/progressive-operations.md` — set - `weight = 20`. **File:** `docs/content/deep-dives/core-concepts.md` — set + `weight = 20`. **File:** `docs/content/deep-dives/domain-model/_index.md` — set `weight = 30`. Order becomes vision → progressive operations → core concepts. ### Phase 3 — Align the hub and subsystem pages **Goal:** the katalyst-altitude pages use the settled terms consistently. -1. **File:** `docs/content/deep-dives/domain-model.md` — sharpen the one-line +1. **File:** `docs/content/deep-dives/domain-model/_index.md` — sharpen the one-line statement of how it differs from core-concepts (specific map vs general map); apply item/document usage; confirm "raw-source" wording matches the glossary. -2. **File:** `docs/content/deep-dives/storage.md` — reword "the data interface +2. **File:** `docs/content/deep-dives/domain-model/storage.md` — reword "the data interface concept" to name the deprecation explicitly or drop it; the storage vocabulary stands on its own. -3. **File:** `docs/content/deep-dives/collections.md`, - `docs/content/deep-dives/inspectors.md` — apply item/document where the form +3. **File:** `docs/content/deep-dives/domain-model/collections.md`, + `docs/content/deep-dives/domain-model/inspectors.md` — apply item/document where the form is the subject; keep "raw-source layer" (consistency check only). ### Phase 4 — Retitle pages, relocate command-organization **Goal:** plain key-term titles, and CLI-org rationale lives next to `cmd/`. -1. **File:** `docs/content/deep-dives/collections.md` → title "Collections"; - `docs/content/deep-dives/checks.md` → "Checks"; - `docs/content/deep-dives/inspectors.md` → "Inspectors". Title-only; +1. **File:** `docs/content/deep-dives/domain-model/collections.md` → title "Collections"; + `docs/content/deep-dives/domain-model/checks.md` → "Checks"; + `docs/content/deep-dives/domain-model/inspectors.md` → "Inspectors". Title-only; filenames and `relref` links are unchanged. 2. **File:** `cmd/organization.md` (new) — move the body of `command-organization.md` here as plain markdown: strip the Hugo `+++` @@ -144,13 +144,13 @@ reference is unchanged. | File | Role | |---|---| | `docs/content/reference/glossary.md` | Canonical definitions; gains attribute/field/operation/aggregate/validation-result/family + the general/specific rule. | -| `docs/content/deep-dives/core-concepts.md` | Slimmed general hub; loses Goal + Implications; reweighted `30`. | +| `docs/content/deep-dives/domain-model/_index.md` | Slimmed general hub; loses Goal + Implications; reweighted `30`. | | `docs/content/deep-dives/progressive-operations.md` | Gains the operations thesis; reweighted `20`. | -| `docs/content/deep-dives/domain-model.md` | Katalyst hub; terminology-aligned. | -| `docs/content/deep-dives/storage.md` | Drops the "data interface" framing. | -| `docs/content/deep-dives/collections.md` | Retitled "Collections"; item/document aligned. | -| `docs/content/deep-dives/checks.md` | Retitled "Checks". | -| `docs/content/deep-dives/inspectors.md` | Retitled "Inspectors"; item/document aligned. | +| `docs/content/deep-dives/domain-model/_index.md` | Katalyst hub; terminology-aligned. | +| `docs/content/deep-dives/domain-model/storage.md` | Drops the "data interface" framing. | +| `docs/content/deep-dives/domain-model/collections.md` | Retitled "Collections"; item/document aligned. | +| `docs/content/deep-dives/domain-model/checks.md` | Retitled "Checks". | +| `docs/content/deep-dives/domain-model/inspectors.md` | Retitled "Inspectors"; item/document aligned. | | `docs/content/deep-dives/_index.md` | Drops the command-organization clause. | | `cmd/organization.md` (new) | CLI command-grammar rationale, moved from the deep-dive. | | `docs/content/deep-dives/command-organization.md` | Deleted. | diff --git a/product/specs/domain-model-cleanup-spec.md b/product/specs/domain-model-cleanup-spec.md index 3dfbe03c..2eaab52f 100644 --- a/product/specs/domain-model-cleanup-spec.md +++ b/product/specs/domain-model-cleanup-spec.md @@ -29,9 +29,9 @@ from reappearing. Three docs define overlapping vocabulary at different altitudes, and neither the code nor the CLI fully agrees with them: -- `docs/content/deep-dives/core-concepts.md` is the tool-agnostic model +- `docs/content/deep-dives/domain-model/_index.md` is the tool-agnostic model (data interface, item, collection, attribute, operation, check, inspector). -- `docs/content/deep-dives/domain-model.md` is the katalyst-specific model +- `docs/content/deep-dives/domain-model/_index.md` is the katalyst-specific model (markdown document, schema, config, resolver, the check families, invariants). - `docs/content/reference/glossary.md` is meant to be the quick-lookup source of truth, but it omits some core-concepts terms (*attribute*, *operation*, @@ -120,12 +120,12 @@ The glossary is canonical; the deep-dives narrate. term defined exactly once, here, including the specialization links (document = markdown item; field = object attribute). Both deep-dives link to it for definitions instead of restating them. -- **Core concepts** (`deep-dives/core-concepts.md`) is the general, +- **Core concepts** (`deep-dives/domain-model/_index.md`) is the general, backend-neutral model, written entirely in general terms (item, attribute, collection, storage, operation, check, inspector). It explains *why* the abstractions exist and what they would mean for Postgres or Mongo. It narrates and links to the glossary; it does not define terms. -- **Domain model** (`deep-dives/domain-model.md`) is katalyst's concrete +- **Domain model** (`deep-dives/domain-model/_index.md`) is katalyst's concrete instantiation: the filesystem backend, markdown documents, object fields, JSON Schema, the Go types, the `check`/`fix` lifecycles, and the invariants. It uses general terms by default and the specific terms where the concrete form is the @@ -177,7 +177,7 @@ terminology-align the new pages. the specialization link), *operation*, *aggregate*, and *validation result*; keep *family* defined in the CheckLibrary row and checks.md, not its own entry; apply the storage / source / item-document decisions. -- **`docs/content/deep-dives/core-concepts.md`** — the primary doc target now: +- **`docs/content/deep-dives/domain-model/_index.md`** — the primary doc target now: slim from the encyclopedic definitions into a general-altitude hub mirroring `domain-model.md`. Define each general term in a line, link to the glossary for the definition and to where the general idea is discussed (e.g. *operation* → @@ -190,13 +190,13 @@ terminology-align the new pages. - **`docs/content/deep-dives/progressive-operations.md`** — gains the operations-thesis sentence relocated from core-concepts' Implications as its lead, since the page already demonstrates the claim tier by tier. -- **`docs/content/deep-dives/domain-model.md`** — **kept** as the katalyst hub +- **`docs/content/deep-dives/domain-model/_index.md`** — **kept** as the katalyst hub #73 built. No structural change; terminology-align it (item/document, raw-source consistency) and sharpen its one-line statement of how it differs from core-concepts. -- **`docs/content/deep-dives/collections.md`, `inspectors.md`** — new in #73 and +- **`docs/content/deep-dives/domain-model/collections.md`, `inspectors.md`** — new in #73 and the homes for the former domain-model detail (resolver table, `check` lifecycle, invariants, inspector layers). Apply the item/document decisions here; keep "raw-source" consistent. -- **`docs/content/deep-dives/storage.md`** — confirm wording now that "data +- **`docs/content/deep-dives/domain-model/storage.md`** — confirm wording now that "data interface" is deprecated in favor of the storage vocabulary. - **Deep-dive titles** — rename to match the key terms, not "How X work": `collections.md` → "Collections", `checks.md` → "Checks", `inspectors.md` → diff --git a/product/specs/domain-model-terminology-matrix.md b/product/specs/domain-model-terminology-matrix.md index 3bf27c3c..1b5bb763 100644 --- a/product/specs/domain-model-terminology-matrix.md +++ b/product/specs/domain-model-terminology-matrix.md @@ -16,8 +16,8 @@ hand when the code or docs move. |---|---| | **Internal code** | Package names and exported Go identifiers under `internal/`. | | **CLI** | Command/subcommand names and the user-facing nouns in their help text (`cmd/`). | -| **Domain model** | Terms as used in `docs/content/deep-dives/domain-model.md` (katalyst-specific). | -| **Core concepts** | Terms as used in `docs/content/deep-dives/core-concepts.md` (tool-agnostic). | +| **Domain model** | Terms as used in `docs/content/deep-dives/domain-model/_index.md` (katalyst-specific). | +| **Core concepts** | Terms as used in `docs/content/deep-dives/domain-model/_index.md` (tool-agnostic). | | **Glossary** | Entries in `docs/content/reference/glossary.md` (the intended single source of truth). | `—` means the source has no term for that concept. **Bold** marks a term that diff --git a/product/specs/listing-predicate-plan.md b/product/specs/listing-predicate-plan.md index 570b2c3f..7dc1435c 100644 --- a/product/specs/listing-predicate-plan.md +++ b/product/specs/listing-predicate-plan.md @@ -217,17 +217,17 @@ Goal: Update callers and docs to the new names. Rename the `query` section to `listing`, show `listing:` examples, and note that `query:` has been replaced. - **File:** `docs/content/deep-dives/collections.md` + **File:** `docs/content/deep-dives/domain-model/collections.md` Replace references to the query package with the metadata predicate grammar. - **File:** `docs/content/deep-dives/domain-model.md` + **File:** `docs/content/deep-dives/domain-model/_index.md` Replace the "Query" out-of-scope note with the explicit split: listing filters and sort keys are shipped for one collection; first-class Query is planned. - **File:** `docs/content/deep-dives/core-concepts.md` + **File:** `docs/content/deep-dives/domain-model/_index.md` Mark **Query** as planned rather than shipped. Keep listing filters out of the operation list unless they are named as part of Listing. @@ -300,9 +300,9 @@ Goal: Verify the rename is complete and behavior stayed stable. | `cmd/engine.go` | Variant predicate evaluation | | `internal/storage/collection/AGENTS.md` | Package conventions | | `docs/content/reference/configuration.md` | User-facing config reference | -| `docs/content/deep-dives/collections.md` | Variant terminology | -| `docs/content/deep-dives/domain-model.md` | Query/listing vocabulary | -| `docs/content/deep-dives/core-concepts.md` | Query operation vocabulary | +| `docs/content/deep-dives/domain-model/collections.md` | Variant terminology | +| `docs/content/deep-dives/domain-model/_index.md` | Query/listing vocabulary | +| `docs/content/deep-dives/domain-model/_index.md` | Query operation vocabulary | | `product/specs/domain-model-terminology-matrix.md` | Naming matrix | | GitHub issue #76 | Terminology contradiction this plan resolves | @@ -326,11 +326,11 @@ Documentation ships in Phase 4. - `internal/storage/collection/listing/doc.go`: add listing package docs. - `docs/content/reference/configuration.md`: rename the `query` section to `listing` and document the migration error. -- `docs/content/deep-dives/collections.md`: describe variants as using metadata +- `docs/content/deep-dives/domain-model/collections.md`: describe variants as using metadata predicates. -- `docs/content/deep-dives/domain-model.md`: distinguish shipped listing filters +- `docs/content/deep-dives/domain-model/_index.md`: distinguish shipped listing filters from planned Query. -- `docs/content/deep-dives/core-concepts.md`: mark Query as planned. +- `docs/content/deep-dives/domain-model/_index.md`: mark Query as planned. - `product/specs/domain-model-terminology-matrix.md`: update the Query/filter row. diff --git a/product/specs/listing-predicate-spec.md b/product/specs/listing-predicate-spec.md index fe6753f0..1dc6e53f 100644 --- a/product/specs/listing-predicate-spec.md +++ b/product/specs/listing-predicate-spec.md @@ -265,12 +265,12 @@ Resolved: item-list pipeline. - `docs/content/reference/configuration.md`: rename the `query:` section to `listing:` and document the config migration. -- `docs/content/deep-dives/collections.md`: update variants to say they use the +- `docs/content/deep-dives/domain-model/collections.md`: update variants to say they use the metadata predicate grammar, not the query package. -- `docs/content/deep-dives/domain-model.md`: replace the current "Query" out-of- +- `docs/content/deep-dives/domain-model/_index.md`: replace the current "Query" out-of- scope note with a precise distinction: listing filters are shipped; a first-class Query operation is planned. -- `docs/content/deep-dives/core-concepts.md`: mark **Query** as a planned +- `docs/content/deep-dives/domain-model/_index.md`: mark **Query** as a planned operation, not a currently shipped one. - GitHub issue #76: close once the docs and code use the new terminology. - `product/specs/domain-model-terminology-matrix.md`: update the Query/filter