From 823163a6326ac89a10d6b369e15884ec68409071 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 18:32:47 +0000 Subject: [PATCH 1/6] Initial plan From ca1cb00173d0975413406da843999b5f7c3ca7da Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 18:50:08 +0000 Subject: [PATCH 2/6] =?UTF-8?q?docs:=20restore=20CORTEX=20architecture=20a?= =?UTF-8?q?lignment=20=E2=80=94=20fix=20Metroid/medoid/centroid=20conflati?= =?UTF-8?q?on?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Rewrite DESIGN.md v1.2: add MetroidBuilder, dialectical search, knowledge gap detection, P2P curiosity, fix all 'Metroid neighbor graph' naming drift - Update PLAN.md v1.2: add MetroidBuilder/KnowledgeGapDetector/DialecticalSearch modules, fix naming errors, correct Phase 2 description - Rewrite TODO.md: add P0-X (naming fix tasks), P1-M (MetroidBuilder), P1-N (knowledge gap), update P1-C/E/F/P2-C/G with correct terminology - Create ARCHITECTURE-REVIEW.md: 15-divergence catalog with file/component/ current/intended/correction/todo-task for each issue - Update README.md and copilot-instructions.md with correct Cortex description No code changes in this pass. Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- .github/copilot-instructions.md | 2 +- ARCHITECTURE-REVIEW.md | 267 ++++++++++++++++++++++++++++++++ DESIGN.md | 212 +++++++++++++++++++++---- PLAN.md | 62 +++++--- README.md | 7 +- TODO.md | 224 +++++++++++++++++++++------ 6 files changed, 668 insertions(+), 106 deletions(-) create mode 100644 ARCHITECTURE-REVIEW.md diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index cd37d3e..094079e 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -6,7 +6,7 @@ CORTEX (**C**lustered **O**ntic **R**outing **T**hrough **E**ntangled e**X**chan The engine models three biological brain regions: - **Hippocampus** — Fast associative encoding (WebGPU multi-prototype lookups, Matryoshka embeddings, Hebbian edge creation, OPFS-backed vector storage) -- **Cortex** — Intelligent routing & coherence (parallel WebGPU scoops, IndexedDB sub-graph retrieval, closed-loop Hebbian path tracing) +- **Cortex** — Intelligent routing & coherence (Metroid construction `{ m1, m2, c }`, dialectical search, Matryoshka dimensional unwinding, knowledge gap detection, P2P curiosity broadcasting, parallel WebGPU scoops, IndexedDB sub-graph retrieval, closed-loop Hebbian path tracing) - **Daydreamer** — Background consolidation Web Worker (LTP/LTD, pruning, medoid recomputation, experience replay) ## Key Documentation Files diff --git a/ARCHITECTURE-REVIEW.md b/ARCHITECTURE-REVIEW.md new file mode 100644 index 0000000..f8a40ef --- /dev/null +++ b/ARCHITECTURE-REVIEW.md @@ -0,0 +1,267 @@ +# CORTEX Architecture Review — Naming Drift Report + +**Date:** 2026-03-13 +**Scope:** Full repository audit against corrected DESIGN.md (v1.2) +**Status:** Documentation-only pass; no code changes made in this review + +--- + +## Executive Summary + +The repository has drifted from the intended CORTEX architecture due to an early conceptual collapse between **medoids** and **Metroids**. This caused the term "Metroid" to be applied throughout the codebase and documentation to describe the sparse proximity/neighbor graph connecting pages — a fundamentally different concept. + +The correct meaning of each term is: + +| Term | Correct Meaning | +|------|----------------| +| **Medoid** | An existing memory node selected as a cluster representative via the medoid statistic | +| **Centroid** | A mathematical average of vectors — a computed point, never a stored node | +| **Metroid** | A structured dialectical search probe: `{ m1, m2, c }` — ephemeral, constructed at query time | + +The sparse proximity graph connecting pages with high cosine similarity is **not** a Metroid. It is the **semantic neighbor graph**. The entire MetroidBuilder component — the heart of CORTEX's epistemic search capability — does not yet exist in the codebase. + +This report catalogs every divergence found and maps each to a correction task in TODO.md. + +--- + +## Divergence Catalog + +### D1 — `core/types.ts`: `MetroidNeighbor` interface + +| Field | Value | +|-------|-------| +| **File** | `core/types.ts` | +| **Line** | ~70 | +| **Component** | `MetroidNeighbor` interface | +| **Current behavior** | Defines a sparse proximity graph edge with `neighborPageId`, `cosineSimilarity`, and `distance`. Named as if it represents a "Metroid" concept. | +| **Intended behavior** | This is a proximity edge in the semantic neighbor graph. It has nothing to do with the `Metroid = { m1, m2, c }` dialectical probe. Should be named `SemanticNeighbor`. | +| **Required correction** | Rename `MetroidNeighbor` → `SemanticNeighbor`. Update all references. | +| **TODO task** | P0-X1 | + +--- + +### D2 — `core/types.ts`: `MetroidSubgraph` interface + +| Field | Value | +|-------|-------| +| **File** | `core/types.ts` | +| **Line** | ~76 | +| **Component** | `MetroidSubgraph` interface | +| **Current behavior** | Defines the induced subgraph used for BFS expansion during retrieval. Named "MetroidSubgraph". | +| **Intended behavior** | This is a semantic neighbor subgraph, not a Metroid. Should be named `SemanticNeighborSubgraph`. | +| **Required correction** | Rename `MetroidSubgraph` → `SemanticNeighborSubgraph`. | +| **TODO task** | P0-X2 | + +--- + +### D3 — `core/types.ts`: `MetadataStore` proximity graph methods + +| Field | Value | +|-------|-------| +| **File** | `core/types.ts` | +| **Lines** | ~178–191 | +| **Component** | `MetadataStore` interface — methods section "Metroid NN radius index" | +| **Current behavior** | Six methods use "Metroid" naming: `putMetroidNeighbors`, `getMetroidNeighbors`, `getInducedMetroidSubgraph`, `needsMetroidRecalc`, `flagVolumeForMetroidRecalc`, `clearMetroidRecalcFlag`. | +| **Intended behavior** | These methods operate on the semantic neighbor graph (a proximity graph). "Metroid" in method names implies a connection to the dialectical probe construct, which is incorrect. | +| **Required correction** | Rename all six methods: `putSemanticNeighbors`, `getSemanticNeighbors`, `getInducedNeighborSubgraph`, `needsNeighborRecalc`, `flagVolumeForNeighborRecalc`, `clearNeighborRecalcFlag`. | +| **TODO task** | P0-X3 | + +--- + +### D4 — `storage/IndexedDbMetadataStore.ts`: `metroid_neighbors` IDB store + +| Field | Value | +|-------|-------| +| **File** | `storage/IndexedDbMetadataStore.ts` | +| **Lines** | ~32–35 (DB store declarations) | +| **Component** | IndexedDB object store named `metroid_neighbors` | +| **Current behavior** | Persists proximity graph edges between pages in a store named `metroid_neighbors`. | +| **Intended behavior** | The store name should reflect that it holds semantic proximity edges, not Metroid probes. Should be `neighbor_graph`. | +| **Required correction** | Rename IDB store from `metroid_neighbors` → `neighbor_graph`. Increment `DB_VERSION`. Add migration in `applyUpgrade` to copy existing data. | +| **TODO task** | P0-X6 | + +--- + +### D5 — `storage/IndexedDbMetadataStore.ts`: proximity graph method implementations + +| Field | Value | +|-------|-------| +| **File** | `storage/IndexedDbMetadataStore.ts` | +| **Lines** | All methods implementing `MetadataStore` proximity graph interface | +| **Component** | `putMetroidNeighbors`, `getMetroidNeighbors`, `getInducedMetroidSubgraph`, `needsMetroidRecalc`, `flagVolumeForMetroidRecalc`, `clearMetroidRecalcFlag` implementations | +| **Current behavior** | Implements the six methods using `MetroidNeighbor` types and `metroid_neighbors` IDB store. | +| **Intended behavior** | Should use renamed types and store. | +| **Required correction** | After interface rename (D1–D4), update all implementations to use new names. | +| **TODO task** | P0-X1–X6 | + +--- + +### D6 — `cortex/Query.ts`: Absent MetroidBuilder + +| Field | Value | +|-------|-------| +| **File** | `cortex/Query.ts` | +| **Lines** | Entire file | +| **Component** | `query()` function | +| **Current behavior** | Embeds query, scores hotpath pages, falls back to full scan, updates PageActivity, runs promotion sweep. Returns a ranked list of pages. **No Metroid is ever constructed. No dialectical search is performed. No knowledge gap is ever detected.** | +| **Intended behavior** | The query path should: (1) select m1 (topic medoid), (2) call MetroidBuilder to construct `{ m1, m2, c }`, (3) use centroid `c` as the balanced search anchor, (4) explore thesis/antithesis/synthesis zones, (5) detect and surface knowledge gaps. | +| **Required correction** | After MetroidBuilder is implemented (P1-M), upgrade `cortex/Query.ts` to include the full dialectical pipeline (P1-E). | +| **TODO task** | P1-E1 | + +--- + +### D7 — `cortex/Query.ts`: `getInducedMetroidSubgraph` call + +| Field | Value | +|-------|-------| +| **File** | `cortex/Query.ts` | +| **Lines** | The subgraph expansion step (BFS, if present) | +| **Component** | Subgraph expansion via `MetadataStore` | +| **Current behavior** | If subgraph BFS is called, it uses `getInducedMetroidSubgraph`, propagating the incorrect naming. | +| **Intended behavior** | Should call `getInducedNeighborSubgraph` (after rename). | +| **Required correction** | Rename the method call after P0-X3 is complete. | +| **TODO task** | P0-X3 | + +--- + +### D8 — DESIGN.md (pre-correction): Incorrect Terminology + +| Field | Value | +|-------|-------| +| **File** | `DESIGN.md` (pre-v1.2) | +| **Component** | Terminology section | +| **Current behavior** | Defined "Metroid (canonical domain term): Sparse nearest-neighbor graph structure inspired by medoid-based clustering." This is architecturally incorrect. | +| **Intended behavior** | Metroid = dialectical probe `{ m1, m2, c }`. The sparse NN graph is the semantic neighbor graph. | +| **Required correction** | **Already corrected in DESIGN.md v1.2.** | +| **TODO task** | Resolved | + +--- + +### D9 — DESIGN.md (pre-correction): Missing MetroidBuilder, Dialectical Search, Knowledge Gap + +| Field | Value | +|-------|-------| +| **File** | `DESIGN.md` (pre-v1.2) | +| **Component** | Entire document | +| **Current behavior** | No section describing MetroidBuilder, Matryoshka dimensional unwinding, antithesis discovery, dialectical search, or knowledge gap detection. | +| **Intended behavior** | These are core architectural concepts that must be described for any engineer to implement CORTEX correctly. | +| **Required correction** | **Already corrected in DESIGN.md v1.2** — new section "Conceptual Constructs: Medoid, Centroid, and Metroid" added. | +| **TODO task** | Resolved | + +--- + +### D10 — PLAN.md (pre-correction): "Metroid vs medoid" note + +| Field | Value | +|-------|-------| +| **File** | `PLAN.md` (pre-v1.2) | +| **Component** | Notes section | +| **Current behavior** | Note read: "Metroid vs medoid: Use Metroid in all API surfaces and docs; medoid only in algorithmic comments." This instructs developers to use the wrong term everywhere, making MetroidBuilder impossible to introduce without collision. | +| **Intended behavior** | The note must distinguish three concepts: Metroid (dialectical probe), medoid (cluster representative), and semantic neighbor graph (proximity graph for BFS). | +| **Required correction** | **Already corrected in PLAN.md v1.2.** | +| **TODO task** | Resolved | + +--- + +### D11 — `PLAN.md` (pre-correction): Missing MetroidBuilder in CORTEX module table + +| Field | Value | +|-------|-------| +| **File** | `PLAN.md` (pre-v1.2) | +| **Component** | CORTEX module table | +| **Current behavior** | No MetroidBuilder, KnowledgeGapDetector, or DialecticalSearch pipeline listed as planned modules. | +| **Intended behavior** | These are critical CORTEX components without which the system is merely a vector search engine. | +| **Required correction** | **Already corrected in PLAN.md v1.2** — new rows added. | +| **TODO task** | Resolved | + +--- + +### D12 — `hippocampus/Ingest.ts`: Semantic neighbor insertion absent + +| Field | Value | +|-------|-------| +| **File** | `hippocampus/Ingest.ts` | +| **Lines** | Entire file | +| **Component** | `ingestText()` function | +| **Current behavior** | Chunks, embeds, persists pages, builds a book, runs promotion sweep. Does **not** insert semantic neighbor edges. | +| **Intended behavior** | After persisting pages, should call `FastNeighborInsert` to maintain the semantic neighbor graph with Williams-bounded degree. | +| **Required correction** | After `FastNeighborInsert` is implemented (P1-C), upgrade `ingestText` to call it (P1-C2). | +| **TODO task** | P1-C2 | + +--- + +### D13 — `core/types.ts`: No Metroid type defined + +| Field | Value | +|-------|-------| +| **File** | `core/types.ts` | +| **Component** | Type definitions | +| **Current behavior** | The word "Metroid" appears only as part of `MetroidNeighbor`, `MetroidSubgraph`, and `MetadataStore` method names — all of which are proximity-graph concepts. The **actual Metroid type** `{ m1, m2, c }` does not exist. | +| **Intended behavior** | `core/types.ts` should define: `interface Metroid { m1: Hash; m2: Hash | null; centroid: Float32Array | null; knowledgeGap: boolean }` and `interface KnowledgeGap { topicMedoidId: Hash; queryEmbedding: Float32Array; dimensionalBoundary: number; timestamp: string }`. | +| **Required correction** | Add these types to `core/types.ts` as part of MetroidBuilder implementation (P1-M). | +| **TODO task** | P1-M1 | + +--- + +### D14 — `core/types.ts`: No `matryoshkaProtectedDim` in `ModelProfile` + +| Field | Value | +|-------|-------| +| **File** | `core/ModelProfile.ts` | +| **Component** | `ModelProfile` interface | +| **Current behavior** | No field for the protected Matryoshka dimension boundary. | +| **Intended behavior** | MetroidBuilder needs to know which lower dimensions to freeze during antithesis search. `ModelProfile` should include `matryoshkaProtectedDim: number` — the number of lower dimensions that encode invariant semantic context. | +| **Required correction** | Add `matryoshkaProtectedDim` to `ModelProfile` interface; add default value to `ModelDefaults.ts`; add per-model value to `BuiltInModelProfiles.ts`. | +| **TODO task** | P1-M1 (prerequisite) | + +--- + +### D15 — `cortex/QueryResult.ts`: No Metroid or knowledge gap fields + +| Field | Value | +|-------|-------| +| **File** | `cortex/QueryResult.ts` | +| **Component** | `QueryResult` interface | +| **Current behavior** | Contains only `pages`, `scores`, and `metadata`. No field for Metroid probe used, no knowledge gap field. | +| **Intended behavior** | Should include `metroid`, `knowledgeGap`, `coherencePath`, and `provenance` fields (see P1-E2). | +| **Required correction** | Upgrade `QueryResult` as part of P1-E2. | +| **TODO task** | P1-E2 | + +--- + +## Summary by Severity + +| Severity | Count | Description | +|----------|-------|-------------| +| **Critical (blocks MetroidBuilder)** | 3 | D1, D2, D3 — type/interface naming collision | +| **High (architectural gap)** | 4 | D4, D6, D13, D14 — missing types and IDB store | +| **Medium (propagated naming error)** | 4 | D5, D7, D12, D15 — implementations following wrong names | +| **Resolved by this PR** | 4 | D8, D9, D10, D11 — corrected in DESIGN.md v1.2 and PLAN.md v1.2 | + +**Total: 15 divergences** (3 + 4 + 4 + 4) + +--- + +## Components with Zero Drift + +The following components correctly implement their intended architecture and require no changes related to this review: + +- `core/HotpathPolicy.ts` — Williams Bound policy implementation; correct +- `core/SalienceEngine.ts` — Promotion/eviction lifecycle; correct +- `core/crypto/` — Hash, sign, verify; correct +- `storage/OPFSVectorStore.ts` — Append-only vector file; correct +- `storage/MemoryVectorStore.ts` — In-memory testing backend; correct +- `embeddings/` — All embedding providers; correct +- `hippocampus/Chunker.ts` — Text chunking; correct +- `hippocampus/PageBuilder.ts` — Page entity construction; correct +- All `VectorBackend` implementations — correct + +--- + +## Recommended Fix Order + +1. **P0-X1–X7** — Fix naming drift in `core/types.ts`, `storage/IndexedDbMetadataStore.ts`, `cortex/Query.ts`, and planned file names. This unblocks MetroidBuilder without risking collision. +2. **P1-M1–M3** — Add `Metroid` and `KnowledgeGap` types; implement `MetroidBuilder`. +3. **P1-N1–N4** — Implement `KnowledgeGapDetector`. +4. **P1-E1–E3** — Upgrade `cortex/Query.ts` to full dialectical orchestrator. +5. **P1-C1–C3** — Implement `FastNeighborInsert` (correctly named after P0-X). diff --git a/DESIGN.md b/DESIGN.md index 4ab6ef5..7ff3a2f 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -1,7 +1,7 @@ # CORTEX Design Specification -**Version:** 1.1 -**Last Updated:** 2026-03-12 +**Version:** 1.2 +**Last Updated:** 2026-03-13 ## Executive Summary @@ -53,13 +53,17 @@ The rapid-write system that turns raw experience into structured memory scaffold **Performance Target:** Single-page persist + fast neighbor update under 50ms on WebGPU hardware #### 2. Cortex — Intelligent Routing & Coherence -Returns self-consistent, coherent context chains rather than bag-of-vectors. +Returns self-consistent, coherent context chains rather than bag-of-vectors. Critically, Cortex **constructs Metroids** — structured dialectical search probes — to explore knowledge epistemically rather than merely confirming existing beliefs. **Responsibilities:** +- Construct Metroids (dialectical probes `{ m1, m2, c }`) for each query topic +- Perform Matryoshka dimensional unwinding to discover antithesis medoids - Perform parallel WebGPU "scoops" across the active universe (sub-millisecond) - Pull relevant sub-graphs from IndexedDB - Trace closed-loop paths through Hebbian connections - Return only coherent context chains +- Detect knowledge gaps when antithesis discovery fails within dimensional constraints +- Broadcast P2P curiosity requests when a knowledge gap is detected **Performance Target:** Shelf→page seed ranking under 20ms; coherence path solve under 10ms for <30 node subgraphs @@ -75,16 +79,131 @@ Idle background consolidation that prevents catastrophic forgetting. **Performance Target:** Opportunistic, interruptible, no foreground blocking -## The Williams Bound & Sublinear Growth +## Conceptual Constructs: Medoid, Centroid, and Metroid + +Three separate mathematical constructs are central to CORTEX. They must never be conflated. + +| Concept | Meaning | +|---------|---------| +| **Medoid** | An actual memory node selected as the statistical representative of a cluster. A medoid is always an existing page in the graph. | +| **Centroid** | A mathematical average of vectors — a computed geometric point, never a stored memory node. | +| **Metroid** | A structured dialectical search probe: `{ m1, m2, c }`. Constructed at query time. Never stored as a graph edge or persistent entity. | + +> **Critical invariant:** These three constructs are entirely distinct. The sparse semantic neighbor graph that connects pages for subgraph expansion is **not** a Metroid. A Metroid is built from medoids, but a medoid is not a Metroid. + +--- + +### The Metroid + +A Metroid is a structured search probe used for epistemically balanced exploration of a topic. + +``` +Metroid = { m1, m2, c } +``` + +Where: +- **m1** — thesis medoid: the cluster representative most relevant to the query topic +- **m2** — antithesis medoid: a cluster representative discovered through constrained Matryoshka search to represent semantic opposition to m1 +- **c** — centroid: the geometric midpoint between m1 and m2, used as the balanced search origin + +The Metroid is constructed at query time by the `MetroidBuilder`. It is **not** a persistent graph structure. It is a transient epistemological instrument. + +--- + +### MetroidBuilder Algorithm + +1. **Select m1** — Identify the topic medoid most relevant to the query embedding. +2. **Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis. +3. **Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1. +4. **Compute centroid** — `c = (m1_vec + m2_vec) / 2` (element-wise average over the unfrozen dimensions). +5. **Prefer centroid as search origin** — Use `c` as the primary starting point for subgraph expansion. This prevents semantic drift toward either pole. +6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search. +7. **Stop at the protected dimension** — The protected lower dimensions are never unwound. This preserves semantic invariants throughout all levels of search. + +**Why protect dimensions?** + +Without dimensional protection, high-dimensional similarity in unrelated vocabulary can dominate the search. Specifically, upper Matryoshka dimensions encode fine-grained distinctions that may closely match surface-level word patterns regardless of topic. Protected lower dimensions encode domain context (e.g., "food/cooking") that anchors the search. Without this anchor, a query about pizza toppings could accumulate similarity mass toward adhesive-related terms in the high dimensions — because words describing how things stick together are statistically present in both culinary and industrial glue contexts. The protected dimensions ensure the culinary domain context is never overridden by this incidental high-dimensional similarity. + +--- + +### Matryoshka Dimensional Unwinding + +CORTEX uses Matryoshka Representation Learning (MRL) models that pack semantic information into nested dimensional layers: + +- **Protected layer** (lower dimensions): invariant context — domain, topic class, language. Never searched for antithesis. +- **Exploration layers** (upper dimensions): fine-grained semantic distinctions. Progressively unwound during antithesis search. + +At each unwinding step: +1. The protected dimension boundary shifts one layer outward. +2. The antithesis search space expands into the newly freed dimensions. +3. A new `m2` candidate is evaluated against the expanded space. +4. The Metroid `{ m1, m2, c }` is recomputed with the updated `m2`. + +This produces progressively wider dialectical exploration while maintaining semantic coherence. The search terminates either when the protected dimension is reached or when a satisfactory `m2` is found. + +--- + +### Dialectical Search + +Every Metroid-driven query explores three zones: + +| Zone | Pole | Meaning | +|------|------|---------| +| Thesis zone | around m1 | Supporting ideas, corroborating evidence | +| Antithesis zone | around m2 | Opposing ideas, counterevidence, alternative perspectives | +| Synthesis zone | around c | Conceptually balanced territory between both poles | + +This three-zone exploration prevents **confirmation bias**: a system that only retrieves nearest neighbors to m1 returns documents that confirm the query's premise. By also exploring m2 and c, CORTEX surfaces contradictions, alternatives, and knowledge gaps. + +The dialectical structure is the core reason CORTEX is described as an _epistemic_ memory system, not a vector retrieval engine. + +--- + +### Knowledge Gap Detection + +If at any stage of MetroidBuilder execution no suitable antithesis medoid `m2` can be found within the constrained search space: + +``` +knowledge_gap = true +``` + +This means CORTEX does not possess sufficient knowledge to provide an epistemically balanced answer. The correct response is to acknowledge the gap rather than fill it with ungrounded content. + +**Response to a knowledge gap:** + +1. Return a `KnowledgeGap` result indicating the topic, the deepest dimensional layer reached, and the search constraints that failed. +2. Emit a P2P curiosity request containing the incomplete Metroid. + +--- + +### P2P Curiosity Requests + +When a knowledge gap is detected, CORTEX broadcasts the incomplete Metroid as a curiosity probe to connected peers: + +``` +CuriosityProbe = { m1, partialMetroid, queryContext, knowledgeBoundary } +``` + +Where `knowledgeBoundary` encodes the dimensional layer where antithesis discovery failed. Peers receiving this probe: + +1. Search their own memory graphs for medoids that could serve as `m2`. +2. If found, respond with the relevant graph fragment (subject to eligibility filtering; see Smart Sharing Guardrails). +3. The originating node integrates the received fragment and may retry MetroidBuilder. + +This mechanism enables **distributed learning without hallucination**: the system discovers knowledge through structured peer exchange rather than generating plausible-sounding but ungrounded content. + +--- + + ### Motivation -CORTEX applies the Williams 2025 result — S = O(√(t log t)) — as a universal sublinear growth law everywhere the system trades space against time: the resident hotpath index, per-tier hierarchy quotas, per-community graph budgets, Metroid degree limits, and Daydreamer maintenance batch sizing. This single principle ensures the system stays efficient as the memory graph scales from hundreds to millions of nodes. +CORTEX applies the Williams 2025 result — S = O(√(t log t)) — as a universal sublinear growth law everywhere the system trades space against time: the resident hotpath index, per-tier hierarchy quotas, per-community graph budgets, semantic neighbor degree limits, and Daydreamer maintenance batch sizing. This single principle ensures the system stays efficient as the memory graph scales from hundreds to millions of nodes. ### Graph Mass Definition ``` -t = |V| + |E| = total pages + (Hebbian edges + Metroid edges) +t = |V| + |E| = total pages + (Hebbian edges + semantic neighbor edges) ``` This is the canonical measure of graph complexity used in all capacity formulas. @@ -160,7 +279,7 @@ Within each tier's budget, slots are allocated proportionally across detected gr The resulting quotas sum to exactly `tier_budget` regardless of the number or sizes of communities, even when there are more communities than `tier_budget`. -where `nᵢ` is the number of pages in community Cᵢ and N is the total page count. Community detection runs via lightweight label propagation on the Metroid neighbor graph during Daydreamer idle passes. +where `nᵢ` is the number of pages in community Cᵢ and N is the total page count. Community detection runs via lightweight label propagation on the semantic neighbor graph during Daydreamer idle passes. This **dual constraint** — tier quota × community quota — ensures both vertical coverage across hierarchy levels and horizontal coverage across topics. @@ -288,17 +407,31 @@ interface Edge { } ``` -#### Metroid Neighbor -Sparse radius-graph edge (project term; medoid-inspired). +#### Semantic Neighbor (Proximity Edge) +Sparse radius-graph edge connecting pages with high cosine similarity. Used for subgraph expansion during retrieval. + +> **Note:** The current codebase names this type `MetroidNeighbor` — this is an architectural naming error introduced by early conceptual drift. The correct term is `SemanticNeighbor` (or equivalent). A code-level rename is tracked in the TODO. The edge is a proximity concept, not a Metroid concept. ```typescript -interface MetroidNeighbor { +interface SemanticNeighbor { neighborPageId: Hash; cosineSimilarity: number; distance: number; // 1 - cosineSimilarity (TSP-ready) } ``` +#### Semantic Neighbor Subgraph +Induced subgraph for BFS-based coherence path expansion. + +> **Note:** Currently named `MetroidSubgraph` in the codebase — same renaming correction applies. + +```typescript +interface SemanticNeighborSubgraph { + nodes: Hash[]; + edges: { from: Hash; to: Hash; distance: number }[]; +} +``` + ### Hotpath Entities #### PageActivity @@ -349,7 +482,7 @@ Structured entity storage with automatic reverse indexes. **Object Stores:** - `pages`, `books`, `volumes`, `shelves` - `edges_hebbian` (Hebbian weights) -- `metroid_neighbors` (sparse NN graph) +- `neighbor_graph` (sparse semantic neighbor graph — currently named `metroid_neighbors` in code; rename tracked in TODO) - `flags` (dirty-volume recalc markers) - `page_to_book`, `book_to_volume`, `volume_to_shelf` (reverse indexes) - `hotpath_index` (periodic HOT-membership checkpoint, keyed by `entityId`; loaded on startup to reconstruct the RAM resident index; written by Daydreamer each maintenance cycle) @@ -360,16 +493,18 @@ Structured entity storage with automatic reverse indexes. ### Cortex Query Path 1. **Embed Query** — Generate query embedding -2. **Score Resident Shelves** — Score query against HOT shelf prototypes in H(t) resident index -3. **Score Resident Volumes** — Score against HOT volume prototypes within top-ranked shelves -4. **Score Resident Books** — Score against HOT book medoids within top-ranked volumes -5. **Score Resident Pages** — Score against HOT page representatives within top-ranked books -6. **Spill to Warm/Cold** — If resident coverage is insufficient, expand lookup to WARM (IndexedDB) and COLD (OPFS) tiers -7. **Expand Subgraph** — BFS through Metroid neighbors using dynamic bounds (see below) -8. **Solve Coherent Path** — Open TSP with dummy-node heuristic -9. **Return Result** — Ordered memory chain + provenance metadata - -Steps 2–5 operate exclusively on the resident set of size H(t), making H(t) the primary latency-control mechanism. Spill to WARM/COLD (step 6) occurs only when the resident set does not contain sufficient coverage. +2. **Select m1** — Score resident medoids (HOT shelf/volume/book prototypes) to identify the topic medoid +3. **Build Metroid** — `MetroidBuilder` constructs `{ m1, m2, c }` via Matryoshka dimensional unwinding; if `m2` cannot be found, set `knowledge_gap = true` and emit a curiosity probe +4. **Score Resident Hierarchy** — Score query (anchored at centroid `c`) against HOT shelf prototypes in H(t) resident index +5. **Score Resident Volumes** — Score against HOT volume prototypes within top-ranked shelves +6. **Score Resident Books** — Score against HOT book medoids within top-ranked volumes +7. **Score Resident Pages** — Score against HOT page representatives within top-ranked books; explore thesis zone (m1), antithesis zone (m2), and synthesis zone (c) +8. **Spill to Warm/Cold** — If resident coverage is insufficient, expand lookup to WARM (IndexedDB) and COLD (OPFS) tiers +9. **Expand Subgraph** — BFS through semantic neighbor graph using dynamic Williams-derived bounds (see below) +10. **Solve Coherent Path** — Open TSP with dummy-node heuristic +11. **Return Result** — Ordered memory chain + provenance metadata (including whether a knowledge gap was detected) + +Steps 2–3 are the dialectical heart of CORTEX. Steps 4–7 are the Williams-bound-controlled resident-first scoring cascade. **Query Cost Meter:** The query path counts vector operations. If the cumulative cost exceeds a Williams-derived budget, the query early-stops and returns the best result found so far. @@ -377,8 +512,9 @@ Steps 2–5 operate exclusively on the resident set of size H(t), making H(t) th Rather than returning nearest neighbors by similarity, Cortex traces a coherent path through the induced subgraph using a dummy-node open TSP strategy. This produces a natural "narrative flow" through related memories. ### Key Constraints -- Steps 2–5 operate on the resident hotpath (H(t) entries), not the full corpus -- Subgraph expansion uses dynamic Williams-derived bounds, not a fixed node cap: +- Steps 4–7 operate on the resident hotpath (H(t) entries), not the full corpus +- Metroid construction (step 3) is a prerequisite for dialectically balanced exploration; if it fails, a knowledge gap is declared +- Subgraph expansion (step 9) uses the **semantic neighbor graph** and dynamic Williams-derived bounds, not a fixed node cap: - `maxSubgraphSize = min(30, ⌊√(t · log₂(1+t)) / log₂(t)⌋)` - `maxHops = ⌈log₂(log₂(1 + t))⌉` - `perHopBranching = ⌊maxSubgraphSize ^ (1/maxHops)⌋` @@ -394,11 +530,11 @@ Rather than returning nearest neighbors by similarity, Cortex traces a coherent 3. **Persist Vectors** — Append to OPFS vector file 4. **Persist Pages** — Write page metadata to IndexedDB; initialise `PageActivity` record 5. **Build/Attach Hierarchy** — Construct/update books, volumes, shelves; attempt hotpath admission for each level's medoid/prototype using tier quota via `SalienceEngine` -6. **Fast Neighbor Insert** — Update Metroid neighbors incrementally; bounded degree via `HotpathPolicy`; check new page for hotpath admission +6. **Fast Semantic Neighbor Insert** — Update semantic neighbor graph incrementally; bounded degree via `HotpathPolicy`; check new page for hotpath admission 7. **Mark Dirty** — Flag volumes for full recalc by Daydreamer **Incremental Strategy:** -Fast local Metroid neighbor insertion keeps query-time latency low. Full neighborhood recalculation is deferred to idle Daydreamer passes. Hotpath admission runs at ingest time for new pages and hierarchy prototypes. +Fast local semantic neighbor insertion keeps query-time latency low. Full neighborhood recalculation is deferred to idle Daydreamer passes. Hotpath admission runs at ingest time for new pages and hierarchy prototypes. ## Consolidation Design @@ -407,7 +543,7 @@ Fast local Metroid neighbor insertion keeps query-time latency low. Full neighbo **LTP/LTD (Hebbian Updates):** - Strengthen edges traversed during successful queries - Decay unused edges toward zero -- Prune edges below threshold, keeping Metroid degree within Williams-derived bounds +- Prune edges below threshold, keeping semantic neighbor degree within Williams-derived bounds - After LTP/LTD: recompute σ(v) for all nodes whose incident edges changed; run promotion/eviction sweep via `SalienceEngine` **Prototype Recomputation:** @@ -415,7 +551,7 @@ Fast local Metroid neighbor insertion keeps query-time latency low. Full neighbo - Update prototype vectors in vector file - After recomputation: recompute salience for affected representative entries; run tier-quota promotion/eviction for volume and shelf tiers -**Full Metroid Recalc:** +**Full Neighbor Graph Recalc:** - For dirty volumes, recompute all pairwise similarities - Bound batch size: process at most O(√(t log t)) pairwise comparisons per idle cycle - Prioritise dirtiest volumes first @@ -423,7 +559,7 @@ Fast local Metroid neighbor insertion keeps query-time latency low. Full neighbo - Clear dirty flags; recompute salience for affected nodes; run promotion sweep **Community Detection:** -- Run lightweight label propagation on the Metroid neighbor graph during idle passes +- Run lightweight label propagation on the semantic neighbor graph during idle passes - Store community labels in `PageActivity.communityId` - Rerun when dirty-volume flags indicate meaningful structural change - Empty communities release their slots; new communities receive at least one slot @@ -503,7 +639,7 @@ All operations must complete on WASM fallback, albeit slower. The resident hotpa - On-device ingest, query, consolidation, persistence - Multi-backend vector compute (`webgpu`, `webgl`, `webnn`, `wasm`) - Signed graph entities with hash verification -- Sparse Metroid-neighbor graph for coherence routing +- Sparse semantic neighbor graph for coherence routing - Smart interest sharing: opt-in signed subgraph exchange over P2P with pre-share eligibility filtering ### Out of Scope for v1 @@ -523,21 +659,31 @@ Smart sharing is a core capability, not a post-v1 extra. The v1 exchange path mu ## Terminology -**Metroid** (canonical domain term): Sparse nearest-neighbor graph structure inspired by medoid-based clustering. Used throughout API surfaces and documentation. +**Medoid** (mathematical term): The existing memory node selected as the statistical representative of a cluster. Selected by minimising the sum of distances to all other nodes in the cluster. Used throughout algorithmic descriptions and internal implementation comments. + +**Centroid** (mathematical term): The arithmetic mean of a set of vectors — a computed geometric point that may not correspond to any stored page. Used in MetroidBuilder to compute the balanced search origin `c`. -**medoid** (mathematical term): The underlying clustering statistic. Reserved for algorithmic comments and internal statistical descriptions only. +**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid between them. **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem. + +**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via Matryoshka dimensional unwinding. Planned module: `cortex/MetroidBuilder.ts`. + +**Semantic neighbor graph** (also: proximity graph, neighbor graph): The sparse radius-graph of cosine-similarity edges between pages, used for subgraph expansion during retrieval. This is **not** the same as a Metroid. The edges connect pages with high cosine similarity and are used for BFS expansion. Currently named `MetroidNeighbor` / `metroid_neighbors` in the codebase — this is a naming error that must be corrected (tracked in TODO as P0-X). **Hotpath**: The in-memory resident index of H(t) entries spanning all four hierarchy tiers. The hotpath is the first lookup target for every query; misses spill to WARM/COLD storage. HOT membership and salience are checkpointed to the `hotpath_index` IndexedDB store by Daydreamer each maintenance cycle, allowing the RAM index to be restored after a page reload or machine reboot without full corpus replay. **Williams Bound**: The theoretical result S = O(√(t log t)) from Williams 2025, applied here as a universal sublinear growth law for all space-time tradeoff subsystems in CORTEX. -**Graph mass (t)**: t = |V| + |E| = total pages plus all edges (Hebbian + Metroid). The canonical input to all capacity and bound formulas. +**Graph mass (t)**: t = |V| + |E| = total pages plus all edges (Hebbian + semantic neighbor). The canonical input to all capacity and bound formulas. **Salience (σ)**: Node-level score combining Hebbian edge weight, recency, and query-hit frequency. Drives admission to and eviction from the hotpath. **Three-zone model**: HOT (resident), WARM (IndexedDB-indexed), COLD (OPFS bytes only). All zones retain data locally; zones differ only in lookup cost. -**Community**: A topically coherent subgraph identified by label propagation on the Metroid neighbor graph. Community quotas prevent any single topic from monopolising the hotpath. +**Community**: A topically coherent subgraph identified by label propagation on the semantic neighbor graph. Community quotas prevent any single topic from monopolising the hotpath. + +**Knowledge gap**: A state where MetroidBuilder cannot find a valid antithesis medoid `m2` within dimensional constraints. Triggers a P2P curiosity request. + +**Curiosity probe**: A P2P broadcast containing an incomplete Metroid (`{ m1, partialMetroid, knowledgeBoundary }`) sent when a knowledge gap is detected. Peers respond with graph fragments that may enable antithesis discovery. ## Model-Derived Numerics diff --git a/PLAN.md b/PLAN.md index b4e9cd8..245f86f 100644 --- a/PLAN.md +++ b/PLAN.md @@ -1,6 +1,6 @@ # CORTEX Implementation Plan -**Version:** 1.1 +**Version:** 1.2 **Last Updated:** 2026-03-13 This document tracks the implementation status of each major module in CORTEX. It shows what's been built, what's in progress, and what remains. @@ -37,7 +37,7 @@ This document tracks the implementation status of each major module in CORTEX. I |--------|--------|-------|-------| | Vector Store (OPFS) | ✅ Complete | `storage/OPFSVectorStore.ts` | Append-only binary vector file; byte-offset addressing; test coverage via `tests/Persistence.test.ts` | | Vector Store (Memory) | ✅ Complete | `storage/MemoryVectorStore.ts` | In-memory implementation for testing | -| Metadata Store (IndexedDB) | ✅ Complete | `storage/IndexedDbMetadataStore.ts` | Full CRUD for all entities; reverse indexes; Metroid neighbor operations; dirty-volume flags; includes `hotpath_index` and `page_activity` object stores; hotpath CRUD methods are implemented and covered by `tests/Persistence.test.ts` | +| Metadata Store (IndexedDB) | ✅ Complete | `storage/IndexedDbMetadataStore.ts` | Full CRUD for all entities; reverse indexes; semantic neighbor graph operations (currently misnamed as "Metroid neighbor" — see TODO P0-X); dirty-volume flags; includes `hotpath_index` and `page_activity` object stores; hotpath CRUD methods are implemented and covered by `tests/Persistence.test.ts` | **Storage Status:** 3/3 complete (100%) @@ -81,7 +81,7 @@ This document tracks the implementation status of each major module in CORTEX. I | Page ID Generation | ❌ Missing | `hippocampus/PageIdGenerator.ts` (planned) | Deterministic hash-based ID creation | | Ingest Orchestrator | ❌ Missing | `hippocampus/Ingest.ts` (planned) | Main entry point: chunk → embed → persist → initialise PageActivity → build hierarchy → fast neighbor insert → hotpath admission | | Hierarchy Builder | ❌ Missing | `hippocampus/HierarchyBuilder.ts` (planned) | Construct/update Books, Volumes, Shelves; attempt tier-quota hotpath admission for each level's medoid/prototype; Williams-derived fanout bounds; trigger split via ClusterStability when bounds exceeded | -| Fast Neighbor Insert | ❌ Missing | `hippocampus/FastMetroidInsert.ts` (planned) | Incremental Metroid neighbor update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-weight neighbor on degree overflow; check new page for hotpath admission | +| Fast Semantic Neighbor Insert | ❌ Missing | `hippocampus/FastNeighborInsert.ts` (planned) | Incremental semantic neighbor graph update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-weight neighbor on degree overflow; check new page for hotpath admission. **Note:** Not to be confused with Metroid construction, which is a CORTEX retrieval concern. | **Hippocampus Status:** 0/5 complete (0%) @@ -94,15 +94,18 @@ This document tracks the implementation status of each major module in CORTEX. I | Module | Status | Files | Notes | |--------|--------|-------|-------| | Ranking Pipeline | ❌ Missing | `cortex/Ranking.ts` (planned) | Resident-first scoring cascade: HOT shelves → HOT volumes → HOT books → HOT pages; spill to WARM/COLD only when coverage insufficient | +| MetroidBuilder | ❌ Missing | `cortex/MetroidBuilder.ts` (planned) | Constructs Metroid `{ m1, m2, c }` via Matryoshka dimensional unwinding; antithesis discovery; centroid computation; knowledge gap detection | +| Dialectical Search Pipeline | ❌ Missing | `cortex/DialecticalSearch.ts` (planned) | Orchestrates thesis/antithesis/synthesis zone exploration using a Metroid; prevents confirmation bias | +| Knowledge Gap Detector | ❌ Missing | `cortex/KnowledgeGapDetector.ts` (planned) | Determines when MetroidBuilder cannot find m2; emits curiosity probe | | Seed Selection | ❌ Missing | `cortex/SeedSelection.ts` (planned) | Threshold-based top-k page selection from ranking output | -| Subgraph Expansion | 🟡 Partial | `storage/IndexedDbMetadataStore.ts` (`getInducedMetroidSubgraph`) | BFS expansion implemented in storage layer; needs dynamic Williams bounds; needs orchestration wrapper | +| Subgraph Expansion | 🟡 Partial | `storage/IndexedDbMetadataStore.ts` (`getInducedMetroidSubgraph` — to be renamed `getInducedNeighborSubgraph`) | BFS expansion implemented in storage layer; needs dynamic Williams bounds; needs orchestration wrapper | | Open TSP Solver | ❌ Missing | `cortex/OpenTSPSolver.ts` (planned) | Dummy-node open-path heuristic for coherent ordering | -| Query Orchestrator | ❌ Missing | `cortex/Query.ts` (planned) | Main entry point: embed → resident-first ranking → subgraph expansion with dynamic bounds → TSP path → query cost meter → early-stop; return result | -| Result DTO | ❌ Missing | `cortex/QueryResult.ts` (planned) | Structured query result with provenance metadata (coherence path, subgraph size, hop count, edge weights) | +| Query Orchestrator | ❌ Missing | `cortex/Query.ts` (planned) | Main entry point: embed → select m1 → build Metroid → dialectical scoring → subgraph expansion → TSP path → query cost meter → early-stop; return result | +| Result DTO | ❌ Missing | `cortex/QueryResult.ts` (planned) | Structured query result with provenance metadata (coherence path, subgraph size, hop count, edge weights, knowledge gap flag) | -**Cortex Status:** 0.5/6 complete (8%) +**Cortex Status:** 0.5/9 complete (6%) -**Critical Blocker:** Without this, users cannot retrieve memories from the system. +**Critical Blocker:** Without this, users cannot retrieve memories from the system. The MetroidBuilder, dialectical search pipeline, and knowledge gap detector are entirely absent. --- @@ -113,7 +116,7 @@ This document tracks the implementation status of each major module in CORTEX. I | Idle Scheduler | ❌ Missing | `daydreamer/IdleScheduler.ts` (planned) | Cooperative background loop; interruptible; respects CPU budget | | Hebbian Updates | ❌ Missing | `daydreamer/HebbianUpdater.ts` (planned) | LTP (strengthen), LTD (decay), prune below threshold; recompute σ(v) for changed nodes; run promotion/eviction sweep | | Prototype Recomputation | ❌ Missing | `daydreamer/PrototypeRecomputer.ts` (planned) | Recalculate volume/shelf medoids and centroids; recompute salience for affected entries; run tier-quota promotion/eviction | -| Full Metroid Recalc | ❌ Missing | `daydreamer/FullMetroidRecalc.ts` (planned) | Rebuild bounded neighbor lists for dirty volumes; batch size bounded by O(√(t log t)) per idle cycle; recompute salience after recalc | +| Full Neighbor Graph Recalc | ❌ Missing | `daydreamer/FullNeighborRecalc.ts` (planned) | Rebuild bounded neighbor lists for dirty volumes; batch size bounded by O(√(t log t)) per idle cycle; recompute salience after recalc. **Note:** Currently planned as `FullMetroidRecalc` — this is a naming error; see TODO P0-X. | | Experience Replay | ❌ Missing | `daydreamer/ExperienceReplay.ts` (planned) | Simulate queries to reinforce connections | | Cluster Stability | ❌ Missing | `daydreamer/ClusterStability.ts` (planned) | Detect/trigger split/merge for unstable clusters; run lightweight label propagation for community detection; store community labels in PageActivity | @@ -152,7 +155,7 @@ This document tracks the implementation status of each major module in CORTEX. I | Module | Status | Files | Notes | |--------|--------|-------|-------| | Unit Tests | ✅ Complete | `tests/*.test.ts`, `tests/**/*.test.ts` | 115 tests across 13 files; all passing | -| Persistence Tests | ✅ Complete | `tests/Persistence.test.ts` | Full storage layer coverage (OPFS, IndexedDB, Metroid neighbors, hotpath indexes) | +| Persistence Tests | ✅ Complete | `tests/Persistence.test.ts` | Full storage layer coverage (OPFS, IndexedDB, semantic neighbor graph — currently tested as "Metroid neighbors", hotpath indexes) | | Model Tests | ✅ Complete | `tests/model/*.test.ts` | Profile resolution, defaults, routing policy | | Embedding Tests | ✅ Complete | `tests/embeddings/*.test.ts` | Provider resolver, runner, real/dummy backends | | Backend Smoke Tests | ✅ Complete | `tests/BackendSmoke.test.ts` | All vector backends instantiate cleanly | @@ -192,7 +195,7 @@ This document tracks the implementation status of each major module in CORTEX. I | Vector Compute | 100% | — | | Embedding | 83% | WebGL provider (low priority) | | Hippocampus | 0% | **CRITICAL** — No ingest path | -| Cortex | 8% | **CRITICAL** — No retrieval path | +| Cortex | 6% | **CRITICAL** — No retrieval path; MetroidBuilder, dialectical search, knowledge gap detection all missing | | Daydreamer | 0% | Not v1 blocker | | Policy | 100% | — | | Runtime | 100% | — | @@ -249,7 +252,7 @@ This document tracks the implementation status of each major module in CORTEX. I - Chunk → Embed → Persist orchestration - Build Page entities with proper hashing/signing; initialise `PageActivity` record - Single-Book hierarchy (defer Volume/Shelf) - - Basic Metroid neighbor insertion with Williams-bounded degree + - Basic semantic neighbor insertion with Williams-bounded degree 5. **Cortex Query** (`cortex/Query.ts`) - Embed query @@ -265,9 +268,9 @@ This document tracks the implementation status of each major module in CORTEX. I --- -### Phase 2: Add Hierarchy, Coherence & Resident-First Routing (Ship v0.5) +### Phase 2: Add Hierarchy, Dialectical Search & Resident-First Routing (Ship v0.5) -**Goal:** Hierarchical routing, coherent path ordering, and fully resident-first query path. +**Goal:** Hierarchical routing, MetroidBuilder, dialectical search pipeline, coherent path ordering, and fully resident-first query path. 1. **Hierarchy Builder** (`hippocampus/HierarchyBuilder.ts`) - Cluster pages into Books (medoid selection) @@ -276,22 +279,35 @@ This document tracks the implementation status of each major module in CORTEX. I - Attempt tier-quota hotpath admission for each level's medoid/prototype via `SalienceEngine` - Williams-derived fanout bounds; trigger split via `ClusterStability` when exceeded -2. **Ranking Pipeline** (`cortex/Ranking.ts`) +2. **MetroidBuilder** (`cortex/MetroidBuilder.ts`) + - Select m1 (topic medoid) for a given query embedding + - Freeze protected Matryoshka dimensions + - Search for m2 (antithesis medoid) within unfrozen dimensions + - Compute centroid `c = (m1 + m2) / 2` + - Unwind Matryoshka layers progressively, repeating antithesis search + - Return `Metroid { m1, m2, c }` or signal knowledge gap + +3. **Knowledge Gap Detector** (`cortex/KnowledgeGapDetector.ts`) + - Evaluate MetroidBuilder result + - Emit `KnowledgeGap` DTO with dimensional boundary info + - Trigger P2P curiosity probe emission + +4. **Ranking Pipeline** (`cortex/Ranking.ts`) - Resident-first cascade: HOT shelves → HOT volumes → HOT books → HOT pages - Spill to WARM/COLD only when resident coverage insufficient -3. **Open TSP Solver** (`cortex/OpenTSPSolver.ts`) +5. **Open TSP Solver** (`cortex/OpenTSPSolver.ts`) - Dummy-node open-path heuristic - Test on synthetic graphs -4. **Full Query Orchestrator** (`cortex/Query.ts` — upgrade) - - Resident-first hierarchical ranking +6. **Full Query Orchestrator** (`cortex/Query.ts` — upgrade) + - Embed query → select m1 → build Metroid → dialectical scoring cascade - Dynamic subgraph expansion bounds from `HotpathPolicy` - Query cost meter; early-stop on budget exceeded - Coherent path via TSP - - Rich result DTO with provenance + - Rich result DTO with provenance and knowledge gap flag -**Exit Criteria:** User gets coherent ordered context chains through the resident hotpath; query latency controlled by H(t). +**Exit Criteria:** User gets epistemically balanced context chains via MetroidBuilder and dialectical search; knowledge gaps are detected; query latency controlled by H(t). --- @@ -307,7 +323,7 @@ This document tracks the implementation status of each major module in CORTEX. I - LTP/LTD rules; edge pruning - Recompute σ(v) for changed nodes; run promotion/eviction sweep -3. **Full Metroid Recalc** (`daydreamer/FullMetroidRecalc.ts`) +3. **Full Neighbor Graph Recalc** (`daydreamer/FullNeighborRecalc.ts`) - Rebuild neighbor lists for dirty volumes - O(√(t log t)) batch size per idle cycle @@ -316,7 +332,7 @@ This document tracks the implementation status of each major module in CORTEX. I - Tier-quota promotion/eviction after recomputation 5. **Community Detection** (`daydreamer/ClusterStability.ts` — extend) - - Label propagation on Metroid neighbor graph + - Label propagation on semantic neighbor graph - Store community labels in `PageActivity.communityId` - Wire community IDs into `SalienceEngine` promotion/eviction @@ -473,7 +489,7 @@ After every implementation pass: ## Notes -- **Metroid vs medoid:** Use `Metroid` in all API surfaces and docs; `medoid` only in algorithmic comments. +- **Metroid vs medoid vs semantic neighbor graph:** These are three distinct concepts. `Metroid` refers only to the dialectical search probe `{ m1, m2, c }` constructed by `MetroidBuilder` at query time. `medoid` refers to a cluster representative node. The sparse proximity/neighbor graph (used for BFS subgraph expansion) is the **semantic neighbor graph** — it is currently misnamed `MetroidNeighbor`/`MetroidSubgraph` in code (see TODO P0-X for the rename task). - **Model-derived numerics:** Never hardcode; always source from `core/` model profile modules. - **Policy-derived constants:** Never hardcode; always source from `core/HotpathPolicy.ts`. - **Test philosophy:** TDD (Red → Green → Refactor) for all new slices. diff --git a/README.md b/README.md index c829e87..4c5025e 100644 --- a/README.md +++ b/README.md @@ -65,12 +65,16 @@ This is the rapid, multi-path "write" system that turns raw experience into stru When you ask a question, Cortex does **not** return a bag of similar vectors. Instead it: +- Constructs a **Metroid** `{ m1, m2, c }` for the query — a structured dialectical search probe pairing the thesis medoid (m1) with an antithesis medoid (m2) and a balanced centroid (c) +- Performs Matryoshka dimensional unwinding to discover semantically opposing knowledge - Performs parallel WebGPU "scoops" across the entire active universe (sub-millisecond) - Pulls relevant sub-graphs from IndexedDB - Traces closed-loop paths through Hebbian connections - Returns only self-consistent, coherent context chains +- Detects **knowledge gaps** when no antithesis medoid exists within dimensional constraints +- Broadcasts P2P curiosity probes to discover missing knowledge from peers -The result feels like genuine recollection rather than search. +The result feels like genuine recollection rather than search — and surfaces what you *don't* know as clearly as what you do. ### 🌙 Daydreamer — The Default Mode Network When the agent is idle, a throttled Web Worker takes over: @@ -107,5 +111,6 @@ bun run dev:harness # start the browser runtime harness at http://127.0.0.1:4173 | [`DESIGN.md`](DESIGN.md) | Architecture specification and core design principles | | [`PLAN.md`](PLAN.md) | Module-by-module implementation status and development phases | | [`TODO.md`](TODO.md) | Prioritized actionable tasks to ship v1.0 | +| [`ARCHITECTURE-REVIEW.md`](ARCHITECTURE-REVIEW.md) | Repository-wide architectural drift report and correction tasks | | [`docs/api.md`](docs/api.md) | API reference for developers integrating with CORTEX | | [`docs/development.md`](docs/development.md) | Build, test, debug, and Docker workflow | diff --git a/TODO.md b/TODO.md index ca1034d..be5de12 100644 --- a/TODO.md +++ b/TODO.md @@ -204,6 +204,45 @@ These items **must** be completed to have a usable system. Without them, users c --- +### P0-X: Fix Architectural Naming Drift (BLOCKS: correct design implementation) + +**Why:** The codebase uses the term "Metroid" to name the sparse proximity/neighbor graph (`MetroidNeighbor`, `MetroidSubgraph`, `metroid_neighbors`, `getInducedMetroidSubgraph`, `FastMetroidInsert`, `FullMetroidRecalc`). This is architecturally incorrect. In CORTEX, a **Metroid** is a structured dialectical search probe `{ m1, m2, c }` — a concept that does not yet exist in the codebase at all. The proximity graph has nothing to do with Metroids. This naming collision will cause permanent confusion and make the MetroidBuilder impossible to implement cleanly without a rename. + +- [ ] **P0-X1:** Rename `MetroidNeighbor` → `SemanticNeighbor` in `core/types.ts` + - Update all references in `storage/IndexedDbMetadataStore.ts` + - Update all references in test files + - Update JSDoc and inline comments + +- [ ] **P0-X2:** Rename `MetroidSubgraph` → `SemanticNeighborSubgraph` in `core/types.ts` + - Update all references in `storage/IndexedDbMetadataStore.ts` + - Update all references in `cortex/Query.ts` + - Update JSDoc and inline comments + +- [ ] **P0-X3:** Rename `MetadataStore` proximity graph methods: + - `putMetroidNeighbors` → `putSemanticNeighbors` + - `getMetroidNeighbors` → `getSemanticNeighbors` + - `getInducedMetroidSubgraph` → `getInducedNeighborSubgraph` + - `needsMetroidRecalc` → `needsNeighborRecalc` + - `flagVolumeForMetroidRecalc` → `flagVolumeForNeighborRecalc` + - `clearMetroidRecalcFlag` → `clearNeighborRecalcFlag` + - Update all callers in `storage/IndexedDbMetadataStore.ts`, `cortex/Query.ts`, and test files + +- [ ] **P0-X4:** Rename planned Hippocampus file `hippocampus/FastMetroidInsert.ts` → `hippocampus/FastNeighborInsert.ts` + - Rename class/function to `FastNeighborInsert`/`insertSemanticNeighbors` + +- [ ] **P0-X5:** Rename planned Daydreamer file `daydreamer/FullMetroidRecalc.ts` → `daydreamer/FullNeighborRecalc.ts` + - Rename class/function to `FullNeighborRecalc`/`runNeighborRecalc` + +- [ ] **P0-X6:** Rename IndexedDB object store from `metroid_neighbors` → `neighbor_graph` + - Increment `DB_VERSION` in `storage/IndexedDbMetadataStore.ts` + - Add migration in `applyUpgrade` to copy data from old store to new store + +- [ ] **P0-X7:** Update all documentation strings and JSDoc that use "Metroid neighbor" to use "semantic neighbor" + +**Exit Criteria:** No source file uses "Metroid" to refer to the proximity graph. The term "Metroid" is reserved exclusively for the `{ m1, m2, c }` dialectical probe type implemented in `cortex/MetroidBuilder.ts`. + +--- + ## 🟡 High Priority — Ship v0.5 (Hierarchical + Coherent) These items add hierarchical routing and coherent path ordering. They transform CORTEX from a flat vector search into a biologically-inspired memory system. @@ -264,30 +303,30 @@ These items add hierarchical routing and coherent path ordering. They transform --- -### P1-C: Fast Metroid Neighbor Insert (UNBLOCKS: graph coherence) +### P1-C: Fast Semantic Neighbor Insert (UNBLOCKS: graph coherence) -**Why:** Need sparse NN graph for coherent path tracing. Degree must be bounded by `HotpathPolicy` to prevent unbounded graph mass growth. +**Why:** Need a sparse semantic neighbor graph for coherent path tracing. This graph connects pages with high cosine similarity and is used for BFS subgraph expansion during retrieval. Degree must be bounded by `HotpathPolicy` to prevent unbounded graph mass growth. **This is not related to Metroid construction** — the semantic neighbor graph is a proximity concept, not a dialectical probe concept. -- [ ] **P1-C1:** Implement `hippocampus/FastMetroidInsert.ts` +- [ ] **P1-C1:** Implement `hippocampus/FastNeighborInsert.ts` - For each new page, compute similarity to existing pages - Derive max neighbors per page from `HotpathPolicy` constant (not hardcoded K) - - Insert forward edges (page → neighbors) + - Insert forward edges (page → neighbors) as `SemanticNeighbor` records - Insert reverse edges (neighbors → page), respecting max degree - - If a page is already at max degree, evict the neighbor with the lowest Hebbian edge weight + - If a page is already at max degree, evict the neighbor with the lowest cosine similarity - Mark affected volumes as dirty for full Daydreamer recalc - After insertion, check new page for hotpath admission via `SalienceEngine` - [ ] **P1-C2:** Upgrade `hippocampus/Ingest.ts` - - After persisting pages, call `FastMetroidInsert` + - After persisting pages, call `FastNeighborInsert` -- [ ] **P1-C3:** Add Metroid insert test coverage - - `tests/hippocampus/FastMetroidInsert.test.ts` +- [ ] **P1-C3:** Add semantic neighbor insert test coverage + - `tests/hippocampus/FastNeighborInsert.test.ts` - Test neighbor lists are bounded by the policy-derived max degree - Test symmetry (if A→B, then B→A) - - Test that degree overflow evicts lowest-weight neighbor, not a random one + - Test that degree overflow evicts lowest-similarity neighbor, not a random one - Test that new page is considered for hotpath admission after insertion -**Exit Criteria:** Metroid neighbor graph is maintained during ingest with policy-bounded degree. +**Exit Criteria:** Semantic neighbor graph is maintained during ingest with policy-bounded degree. --- @@ -297,7 +336,7 @@ These items add hierarchical routing and coherent path ordering. They transform - [ ] **P1-D1:** Implement `cortex/OpenTSPSolver.ts` - Dummy-node open-path heuristic (greedy nearest-neighbor) - - Input: `MetroidSubgraph` (nodes + edges with distances) + - Input: `SemanticNeighborSubgraph` (nodes + edges with distances; after P0-X2 rename) - Output: ordered path through all nodes - Deterministic for same input @@ -311,46 +350,117 @@ These items add hierarchical routing and coherent path ordering. They transform --- -### P1-E: Full Query Orchestrator (DELIVERS: coherent retrieval) +### P1-M: MetroidBuilder (DELIVERS: dialectical epistemology) + +**Why:** MetroidBuilder is the core of what makes CORTEX an _epistemic_ system rather than a vector search engine. Without it, the system merely returns nearest neighbors and cannot explore opposing perspectives, detect knowledge gaps, or trigger P2P curiosity requests. + +- [ ] **P1-M1:** Implement `cortex/MetroidBuilder.ts` + - Accept a query embedding and a list of resident medoids (shelf/volume/book representatives) + - Select m1: the medoid with highest cosine similarity to the query + - Freeze the protected lower Matryoshka dimensions (dimension count derived from ModelProfile; see `embeddingDimension` and `matryoshkaProtectedDim`) + - In the unfrozen upper dimensions, search for the nearest medoid with **opposing** semantic direction (minimum cosine similarity above a negative threshold, or maximum angular distance) + - This medoid becomes m2 (antithesis) + - Compute centroid: `c = (m1_vec + m2_vec) / 2` + - Return `Metroid { m1, m2, c }`; if no valid m2 found, return `{ m1, m2: null, c: null, knowledgeGap: true }` + +- [ ] **P1-M2:** Implement Matryoshka dimensional unwinding in `cortex/MetroidBuilder.ts` + - After initial Metroid construction, progressively expand the antithesis search into deeper embedding layers + - At each step, lower the protected dimension boundary by one Matryoshka tier + - Re-evaluate `m2` at each tier; prefer the deepest tier's Metroid as the final result + - Stop when the protected dimension floor is reached + +- [ ] **P1-M3:** Add MetroidBuilder test coverage + - `tests/cortex/MetroidBuilder.test.ts` + - Test m1 selection: highest similarity medoid is chosen + - Test m2 selection: most semantically opposite medoid is chosen + - Test centroid computation: midpoint between m1 and m2 vectors + - Test dimensional unwinding: search expands progressively through Matryoshka layers + - Test knowledge gap: when no valid m2 exists in any layer, returns `knowledgeGap: true` + - Test protected dimensions are never searched for antithesis + - Test determinism: same inputs always produce same Metroid + +**Exit Criteria:** MetroidBuilder constructs valid Metroids and correctly detects knowledge gaps. + +--- + +### P1-N: Knowledge Gap Detection & Curiosity Probe (DELIVERS: epistemic honesty) + +**Why:** When MetroidBuilder cannot find m2, the system must acknowledge its knowledge boundary rather than hallucinating. The curiosity probe mechanism enables distributed learning by broadcasting the gap to peers. + +- [ ] **P1-N1:** Implement `cortex/KnowledgeGapDetector.ts` + - Accept MetroidBuilder result; if `knowledgeGap: true`, emit a `KnowledgeGap` DTO + - `KnowledgeGap { topicMedoidId: Hash, queryEmbedding: Float32Array, dimensionalBoundary: number, timestamp: string }` + - This DTO is returned to the caller as part of `QueryResult` + +- [ ] **P1-N2:** Implement curiosity probe construction in `cortex/KnowledgeGapDetector.ts` + - Build `CuriosityProbe { m1, partialMetroid, queryContext, knowledgeBoundary }` + - Store probe locally for broadcast via P2P layer (see P2-G) + - Do not broadcast immediately — queue for the P2P sharing layer + +- [ ] **P1-N3:** Upgrade `cortex/QueryResult.ts` + - Add `knowledgeGap?: KnowledgeGap` field — present when MetroidBuilder failed to find m2 + - Document that callers must check this field before treating results as epistemically complete + +- [ ] **P1-N4:** Add knowledge gap test coverage + - `tests/cortex/KnowledgeGapDetector.test.ts` + - Test that a KnowledgeGap DTO is produced when MetroidBuilder returns `knowledgeGap: true` + - Test that a CuriosityProbe is constructed with correct fields + - Test that QueryResult includes the KnowledgeGap when present + - Test that queries against a rich corpus do NOT produce false-positive knowledge gaps -**Why:** This is the "aha" moment — return memories in natural narrative order through the resident hotpath with dynamic, sublinear expansion bounds. +**Exit Criteria:** System correctly signals knowledge boundaries; callers can distinguish epistemically complete from incomplete results. + +--- + +### P1-E: Full Query Orchestrator (DELIVERS: dialectical retrieval) + +**Why:** This is the "aha" moment — return memories in natural narrative order through the resident hotpath via dialectical Metroid exploration, with dynamic, sublinear expansion bounds. - [ ] **P1-E1:** Upgrade `cortex/Query.ts` (full version) - - Use resident-first hierarchical ranking to select seed pages + - Use resident-first hierarchical ranking to select topic medoid (m1) + - Call `MetroidBuilder` to construct `{ m1, m2, c }` + - If knowledge gap detected, include in result and continue with partial Metroid (m1 only) + - Use centroid `c` as the primary scoring anchor for page selection - Derive dynamic subgraph bounds from `HotpathPolicy` (`maxSubgraphSize`, `maxHops`, `perHopBranching`) - - Call `MetadataStore.getInducedMetroidSubgraph(seedPages, maxHops)` using dynamic `maxHops` + - Call `MetadataStore.getInducedNeighborSubgraph(seedPages, maxHops)` using dynamic `maxHops` - Call `OpenTSPSolver.solve(subgraph)` - Return ordered page list via coherent path - **Query cost meter:** count vector operations; early-stop and return best-so-far if cost exceeds Williams-derived budget - - Include provenance metadata (hop count, edge weights, subgraph size, cost) + - Include provenance metadata (hop count, edge weights, subgraph size, cost, Metroid details) - [ ] **P1-E2:** Upgrade `cortex/QueryResult.ts` - Add `coherencePath: Hash[]` (ordered page IDs) + - Add `metroid?: { m1: Hash; m2: Hash | null; centroid: Float32Array | null }` (Metroid used for this query) + - Add `knowledgeGap?: KnowledgeGap` (if antithesis discovery failed) - Add `provenance: { subgraphSize: number; hopCount: number; edgeWeights: number[]; vectorOpCost: number; earlyStop: boolean }` - [ ] **P1-E3:** Add full query test coverage - `tests/cortex/Query.test.ts` (upgrade) - Test subgraph expansion stays within `maxSubgraphSize` - Test TSP ordering + - Test Metroid is built and included in provenance + - Test knowledge gap is returned when antithesis not found - Test provenance metadata - Test early-stop fires when cost budget exceeded -**Exit Criteria:** Queries return coherent ordered context chains through the resident hotpath; dynamic bounds and cost meter active. +**Exit Criteria:** Queries return dialectically balanced, coherent context chains through the resident hotpath; MetroidBuilder active; knowledge gaps surfaced. --- -### P1-F: Integration Test (Hierarchical + Coherent) +### P1-F: Integration Test (Hierarchical + Dialectical) -**Why:** Validate v0.5 completeness including resident-first routing and dynamic subgraph bounds. +**Why:** Validate v0.5 completeness including resident-first routing, MetroidBuilder, and dialectical subgraph bounds. - [ ] **P1-F1:** Upgrade `tests/integration/IngestQuery.test.ts` - Verify hierarchical structures exist after ingest - Verify hotpath entries exist for hierarchy prototypes after ingest + - Verify queries build a valid Metroid `{ m1, m2, c }` - Verify queries return coherent paths through resident hotpath - Verify dynamic subgraph bounds honoured (no expansion beyond `maxSubgraphSize`) - - Compare coherent path vs flat ranking (show narrative flow improvement) + - Verify knowledge gap is correctly signalled when corpus is sparse + - Compare dialectical retrieval vs flat ranking (show epistemic breadth improvement) -**Exit Criteria:** Integration test demonstrates coherent retrieval with resident-first routing. +**Exit Criteria:** Integration test demonstrates dialectically balanced retrieval with resident-first routing and knowledge gap detection. --- @@ -401,20 +511,20 @@ These items add idle background maintenance and privacy-safe interest sharing. T --- -### P2-C: Full Metroid Recalc (DELIVERS: graph maintenance) +### P2-C: Full Neighbor Graph Recalc (DELIVERS: graph maintenance) -**Why:** Incremental fast insert is approximate; need periodic full recalc. Recalc batch size must be bounded by H(t)-derived maintenance budget to avoid blocking the idle loop. +**Why:** Incremental fast semantic neighbor insert is approximate; need periodic full recalc. Recalc batch size must be bounded by H(t)-derived maintenance budget to avoid blocking the idle loop. -- [ ] **P2-C1:** Implement `daydreamer/FullMetroidRecalc.ts` - - Query `MetadataStore.needsMetroidRecalc(volumeId)` for dirty volumes; prioritise dirtiest first +- [ ] **P2-C1:** Implement `daydreamer/FullNeighborRecalc.ts` + - Query `MetadataStore.needsNeighborRecalc(volumeId)` for dirty volumes; prioritise dirtiest first - Load all pages in volume; compute pairwise similarities - Bound batch: process at most `HotpathPolicy.computeCapacity(graphMass)` pairwise comparisons per idle cycle (O(√(t log t))) - - Select policy-derived max neighbors for each page; update `MetadataStore.putMetroidNeighbors` - - Clear dirty flag via `MetadataStore.clearMetroidRecalcFlag` + - Select policy-derived max neighbors for each page; update `MetadataStore.putSemanticNeighbors` + - Clear dirty flag via `MetadataStore.clearNeighborRecalcFlag` - Recompute σ(v) for affected nodes via `SalienceEngine.batchComputeSalience`; run promotion sweep -- [ ] **P2-C2:** Add Metroid recalc test coverage - - `tests/daydreamer/FullMetroidRecalc.test.ts` +- [ ] **P2-C2:** Add neighbor graph recalc test coverage + - `tests/daydreamer/FullNeighborRecalc.test.ts` - Test dirty flag cleared after recalc - Test neighbor quality improved vs fast insert - Test batch size respects O(√(t log t)) limit per cycle @@ -467,7 +577,7 @@ These items add idle background maintenance and privacy-safe interest sharing. T **Why:** Without community detection, a single dense topic can fill the entire page-tier quota, crowding out unrelated memories. Community quotas ensure the hotpath is both hot (high salience) and diverse (topic-representative). - [ ] **P2-F1:** Add community detection to `daydreamer/ClusterStability.ts` - - Implement lightweight label propagation on the Metroid neighbor graph + - Implement lightweight label propagation on the semantic neighbor graph - Run during idle passes when dirty-volume flags indicate meaningful structural change - Store community labels in `PageActivity.communityId` via `MetadataStore.putPageActivity` - Rerun when graph topology changes significantly (post-split, post-merge, post-full-recalc) @@ -490,9 +600,16 @@ These items add idle background maintenance and privacy-safe interest sharing. T --- -### P2-G: Smart Interest Sharing & PII Guardrail (DELIVERS: discovery without identity leakage) +### P2-G: Curiosity Broadcasting & Smart Interest Sharing (DELIVERS: distributed learning without hallucination) + +**Why:** When knowledge gaps are detected, CORTEX must be able to broadcast the incomplete Metroid as a curiosity probe to connected peers. Peers respond with relevant fragments, enabling collaborative learning. Additionally, interest sharing is a core product value for both app and library surfaces. v1 must share public-interest graph sections while preventing personal data leakage. -**Why:** Interest sharing is core product value for both app and library surfaces. v1 must share public-interest graph sections while preventing personal data leakage. +- [ ] **P2-G0:** Implement `sharing/CuriosityBroadcaster.ts` + - Consume pending `CuriosityProbe` objects queued by `KnowledgeGapDetector` + - Serialize and broadcast to connected peers via P2P transport + - Handle responses: deserialize incoming graph fragments; pass to `SubgraphImporter` for integration + - Rate-limit broadcasts to prevent spam + - Include `knowledgeBoundary` field in probe so peers can target search precisely - [ ] **P2-G1:** Implement `sharing/EligibilityClassifier.ts` - Classify candidate nodes as share-eligible vs blocked before export @@ -501,6 +618,7 @@ These items add idle background maintenance and privacy-safe interest sharing. T - [ ] **P2-G2:** Implement `sharing/SubgraphExporter.ts` - Build topic-scoped graph slices from eligible nodes only + - For curiosity responses: select graph fragment relevant to the received probe's `knowledgeBoundary` - Preserve node/edge signatures and provenance - Strip or coarsen personal metadata fields that are not needed for discovery @@ -508,13 +626,16 @@ These items add idle background maintenance and privacy-safe interest sharing. T - Opt-in peer exchange over P2P transport - Verify signatures and schema on import; reject invalid or tampered payloads - Merge imported slices into discovery pathways without exposing sender identity metadata + - After import, retry MetroidBuilder for any pending knowledge gaps that may be resolved by new data - [ ] **P2-G4:** Add sharing safety and discovery tests - `tests/sharing/EligibilityClassifier.test.ts` + - `tests/sharing/CuriosityBroadcaster.test.ts` - `tests/sharing/SubgraphExchange.test.ts` - - Assert blocked nodes are never exported; assert imported AI-interest updates are discoverable via query + - Assert blocked nodes are never exported; assert imported fragments are discoverable via query + - Assert that after receiving a response to a curiosity probe, MetroidBuilder can now construct m2 for the previously-gapped topic -**Exit Criteria:** v1 can exchange signed public-interest slices over P2P, and share-blocking reliably prevents PII/identity leakage. +**Exit Criteria:** v1 can broadcast curiosity probes for knowledge gaps, receive graph fragments from peers, retry MetroidBuilder with new data, and exchange signed public-interest slices with PII blocking. --- @@ -695,37 +816,44 @@ These items improve quality, performance, and developer experience. Not blockers | Phase | Items | Status | Blocking | |-------|-------|--------|----------| -| v0.1 (Minimal Viable) | 23 tasks (P0-A through P0-G + P0-E) | 🟡 In Progress (P0-A complete) | User cannot use system | -| v0.5 (Hierarchical + Coherent) | 14 tasks (P1-A through P1-F) | ❌ Not started | Blocked by v0.1 | -| v1.0 (Background Consolidation + Smart Sharing) | 18 tasks (P2-A through P2-G) | ❌ Not started | Blocked by v0.5 | +| v0.1 (Minimal Viable) | 30 tasks (P0-A through P0-G + P0-E + P0-X) | 🟡 In Progress (P0-A, P0-F, P0-G complete; P0-X architectural rename pending) | User cannot use system correctly; P0-X blocks MetroidBuilder | +| v0.5 (Hierarchical + Dialectical) | 20 tasks (P1-A through P1-F + P1-M + P1-N) | ❌ Not started | Blocked by v0.1 | +| v1.0 (Background Consolidation + Smart Sharing) | 20 tasks (P2-A through P2-G) | ❌ Not started | Blocked by v0.5 | | Polish & Ship | 21 tasks (P3-A through P3-G) | ❌ Not started | Not blocking v1.0 | -**Total:** ~76 actionable tasks +**Total:** ~91 actionable tasks --- -## Quick Reference: Next 7 Tasks to Unblock Everything +## Quick Reference: Next Tasks to Unblock Everything If you're reading this and want to know "what do I work on right now?", here's the answer: -1. **P0-F1:** Implement `core/HotpathPolicy.ts` -2. **P0-F3:** Extend `core/types.ts` (PageActivity, HotpathEntry, TierQuotas) -3. **P0-F4:** Extend `storage/IndexedDbMetadataStore.ts` (hotpath stores) -4. **P0-G1/G2:** Implement `core/SalienceEngine.ts` -5. **P0-B1:** Implement `hippocampus/Chunker.ts` -6. **P0-C1/C2:** Implement `hippocampus/PageBuilder.ts` and `hippocampus/Ingest.ts` -7. **P0-D1:** Implement `cortex/Query.ts` +**Immediate (unblock MetroidBuilder):** +1. **P0-X1–X7:** Fix architectural naming drift (`MetroidNeighbor` → `SemanticNeighbor` and related renames) + +**After P0-X (complete v0.1):** +2. **P0-B1:** Implement `hippocampus/Chunker.ts` +3. **P0-C1/C2:** Implement `hippocampus/PageBuilder.ts` and `hippocampus/Ingest.ts` +4. **P0-D1:** Implement `cortex/Query.ts` (minimal) -Items 1–4 (Williams Bound foundation) should be done first — they are small, independently testable, and unlock correct behaviour in everything that follows. +**After v0.1 (start v0.5):** +5. **P1-A1:** Implement `hippocampus/HierarchyBuilder.ts` +6. **P1-C1:** Implement `hippocampus/FastNeighborInsert.ts` +7. **P1-M1/M2:** Implement `cortex/MetroidBuilder.ts` with Matryoshka unwinding +8. **P1-N1/N2:** Implement `cortex/KnowledgeGapDetector.ts` +9. **P1-D1:** Implement `cortex/OpenTSPSolver.ts` +10. **P1-E1:** Upgrade `cortex/Query.ts` to full dialectical orchestrator --- ## Notes -- **Dependencies:** Items are ordered so that completing tasks in sequence minimises blocked work. P0-F and P0-G (Williams Bound foundation) must precede all hotpath-aware modules. +- **Dependencies:** Items are ordered so that completing tasks in sequence minimises blocked work. P0-X (naming drift fix) must precede MetroidBuilder. P0-F and P0-G (Williams Bound foundation) must precede all hotpath-aware modules. - **Estimates:** Each P0/P1/P2 task is roughly 1-4 hours for an experienced developer familiar with the codebase. - **Testing:** Every implementation task should be accompanied by test coverage (explicitly called out). - **TDD Approach:** Write failing tests first, then implement to green. - **Documentation Sync:** Update PLAN.md module status as tasks complete. - **Williams Bound Invariant:** The resident count must never exceed H(t). Every test that touches the hotpath should assert this. - **Policy constants:** Never hardcode hotpath constants outside `core/HotpathPolicy.ts`. P3-E3 will add a guard to enforce this automatically; until then, enforce by convention. +- **Metroid vs medoid vs semantic neighbor graph:** These are three distinct concepts. `Metroid` = dialectical probe `{ m1, m2, c }` (ephemeral, query-time). `medoid` = cluster representative node. Semantic neighbor graph = sparse proximity edges used for BFS subgraph expansion. Do not conflate them. See P0-X for the code rename tasks that fix the current conflation. From cda985c3e7b10dbccab98410ad106b625ccac4b2 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 19:36:43 +0000 Subject: [PATCH 3/6] feat(core): add matryoshkaProtectedDim to ModelProfile; fix CuriosityProbe fields; fix doc accuracy - core/ModelProfile.ts: add matryoshkaProtectedDim?: number to ModelProfileSeed and ModelProfile - core/ModelDefaults.ts: validate + pass through matryoshkaProtectedDim in buildModelProfileFromSeed - core/ModelProfileResolver.ts: carry matryoshkaProtectedDim through register()/resolve() - core/BuiltInModelProfiles.ts: add matryoshkaProtectedDim: 128 to EMBEDDING_GEMMA_300M_PROFILE - scripts/guard-model-derived.mjs: add matryoshkaProtectedDim to MODEL_FIELD_PATTERN - DESIGN.md: clarify centroid c construction (protected dims copied from m1); expand CuriosityProbe with mimeType + modelUrn fields; update model-derived numerics table - TODO.md: update P1-M1 (matryoshkaProtectedDim from ModelProfile), P1-N2/N4 (mimeType+modelUrn) - PLAN.md: fix module statuses (Chunker/PageBuilder/Ingest/Query/QueryResult exist); update What Works/Doesn't Work; fix blockers section - README.md: reframe Cortex description as planned vs current behavior - ARCHITECTURE-REVIEW.md: note implemented components in zero-drift section Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- ARCHITECTURE-REVIEW.md | 11 +++- DESIGN.md | 55 ++++++++++++---- PLAN.md | 108 +++++++++++++++----------------- README.md | 10 ++- TODO.md | 14 +++-- core/BuiltInModelProfiles.ts | 10 +++ core/ModelDefaults.ts | 12 ++++ core/ModelProfile.ts | 34 ++++++++++ core/ModelProfileResolver.ts | 9 +++ scripts/guard-model-derived.mjs | 2 +- 10 files changed, 184 insertions(+), 81 deletions(-) diff --git a/ARCHITECTURE-REVIEW.md b/ARCHITECTURE-REVIEW.md index f8a40ef..42ab46d 100644 --- a/ARCHITECTURE-REVIEW.md +++ b/ARCHITECTURE-REVIEW.md @@ -244,7 +244,7 @@ This report catalogs every divergence found and maps each to a correction task i ## Components with Zero Drift -The following components correctly implement their intended architecture and require no changes related to this review: +The following components are correctly implemented (or partially implemented in the correct direction) and require no changes related to this naming review: - `core/HotpathPolicy.ts` — Williams Bound policy implementation; correct - `core/SalienceEngine.ts` — Promotion/eviction lifecycle; correct @@ -252,10 +252,15 @@ The following components correctly implement their intended architecture and req - `storage/OPFSVectorStore.ts` — Append-only vector file; correct - `storage/MemoryVectorStore.ts` — In-memory testing backend; correct - `embeddings/` — All embedding providers; correct -- `hippocampus/Chunker.ts` — Text chunking; correct -- `hippocampus/PageBuilder.ts` — Page entity construction; correct +- `hippocampus/Chunker.ts` — Text chunking; **implemented and correct** +- `hippocampus/PageBuilder.ts` — Page entity construction; **implemented and correct** +- `hippocampus/Ingest.ts` — Minimal ingest path; **partially implemented** (chunk→embed→persist→Book→hotpath); correct direction, hierarchy and neighbor insertion deferred +- `cortex/Query.ts` — Minimal query path; **partially implemented** (hotpath-first flat scoring); correct direction, MetroidBuilder deferred +- `cortex/QueryResult.ts` — Minimal result DTO; **partially implemented**; correct direction, provenance fields deferred - All `VectorBackend` implementations — correct +> **Note:** PLAN.md v1.2 has been updated to reflect the actual implementation status of all Hippocampus and Cortex modules. The initial v1.1 plan incorrectly marked `Chunker.ts`, `PageBuilder.ts`, `Ingest.ts`, `Query.ts`, and `QueryResult.ts` as missing; this has been corrected. + --- ## Recommended Fix Order diff --git a/DESIGN.md b/DESIGN.md index 7ff3a2f..d63bd21 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -115,7 +115,10 @@ The Metroid is constructed at query time by the `MetroidBuilder`. It is **not** 1. **Select m1** — Identify the topic medoid most relevant to the query embedding. 2. **Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis. 3. **Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1. -4. **Compute centroid** — `c = (m1_vec + m2_vec) / 2` (element-wise average over the unfrozen dimensions). +4. **Compute centroid** — Compute `c` as follows: + - Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These dimensions are invariant; averaging them would dilute the domain anchor that makes the antithesis search meaningful. + - Unfrozen dimensions (index >= `matryoshkaProtectedDim`): compute the element-wise average of m1 and m2 — `c[i] = (m1[i] + m2[i]) / 2`. + - The result is a full-dimensional vector that can be used directly as a scoring anchor. 5. **Prefer centroid as search origin** — Use `c` as the primary starting point for subgraph expansion. This prevents semantic drift toward either pole. 6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search. 7. **Stop at the protected dimension** — The protected lower dimensions are never unwound. This preserves semantic invariants throughout all levels of search. @@ -181,14 +184,33 @@ This means CORTEX does not possess sufficient knowledge to provide an epistemica When a knowledge gap is detected, CORTEX broadcasts the incomplete Metroid as a curiosity probe to connected peers: ``` -CuriosityProbe = { m1, partialMetroid, queryContext, knowledgeBoundary } +CuriosityProbe = { + m1, + partialMetroid, + queryContext, + knowledgeBoundary, + mimeType, + modelUrn +} ``` -Where `knowledgeBoundary` encodes the dimensional layer where antithesis discovery failed. Peers receiving this probe: +Where: +- **m1** — the thesis medoid (the topic for which antithesis was not found) +- **partialMetroid** — the incomplete Metroid at the boundary of local knowledge +- **queryContext** — the original query embedding, used for scoring by the responding peer +- **knowledgeBoundary** — the Matryoshka dimensional layer at which antithesis search failed +- **mimeType** — the MIME type of the embedded content (e.g. `text/plain`, `image/jpeg`). Required so receiving peers can validate commensurability of their graph sections. +- **modelUrn** — a URN identifying the specific embedding model and version used to produce the vectors (e.g. `urn:model:onnx-community/embeddinggemma-300m-ONNX:v1`). Peers **must** reject probes whose `modelUrn` does not match a model they can compare against. Accepting graph fragments embedded by a different model would produce incommensurable similarity scores at the dimensional boundaries where the models' Matryoshka layers overlap. + +> **Why `mimeType` and `modelUrn` are required:** +> Embedding models project content into incompatible latent spaces. A fragment embedded with `nomic-embed-text-v1.5` (matryoshkaProtectedDim=64) cannot be meaningfully compared against a fragment embedded with `embeddinggemma-300m` (matryoshkaProtectedDim=128). Without explicit model and content-type identity on the probe, a peer could return graph sections that appear similar by cosine score but are semantically incommensurable — introducing hallucination-equivalent errors at the knowledge boundary. -1. Search their own memory graphs for medoids that could serve as `m2`. -2. If found, respond with the relevant graph fragment (subject to eligibility filtering; see Smart Sharing Guardrails). -3. The originating node integrates the received fragment and may retry MetroidBuilder. +Peers receiving this probe: + +1. Verify `mimeType` and `modelUrn` match a supported local model. +2. Search their own memory graphs for medoids that could serve as `m2` using the same embedding space. +3. If found, respond with the relevant graph fragment (subject to eligibility filtering; see Smart Sharing Guardrails). +4. The originating node integrates the received fragment and may retry MetroidBuilder. This mechanism enables **distributed learning without hallucination**: the system discovers knowledge through structured peer exchange rather than generating plausible-sounding but ungrounded content. @@ -661,9 +683,9 @@ Smart sharing is a core capability, not a post-v1 extra. The v1 exchange path mu **Medoid** (mathematical term): The existing memory node selected as the statistical representative of a cluster. Selected by minimising the sum of distances to all other nodes in the cluster. Used throughout algorithmic descriptions and internal implementation comments. -**Centroid** (mathematical term): The arithmetic mean of a set of vectors — a computed geometric point that may not correspond to any stored page. Used in MetroidBuilder to compute the balanced search origin `c`. +**Centroid** (mathematical term): In MetroidBuilder, the centroid `c` is a full-dimensional vector where protected dimensions are copied from m1 (domain invariant) and unfrozen dimensions are the element-wise average of m1 and m2. Used as the balanced search origin in dialectical scoring. -**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid between them. **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem. +**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid (protected dims from m1; unfrozen dims averaged). **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem. **MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via Matryoshka dimensional unwinding. Planned module: `cortex/MetroidBuilder.ts`. @@ -687,15 +709,22 @@ Smart sharing is a core capability, not a post-v1 extra. The v1 exchange path mu ## Model-Derived Numerics -**Critical Rule:** All numeric values derived from ML model architecture (embedding dimensions, context lengths, thresholds) must **never** be hardcoded as magic numbers. +**Critical Rule:** All numeric values derived from ML model architecture (embedding dimensions, context lengths, thresholds, and Matryoshka sub-dimension boundaries) must **never** be hardcoded as magic numbers. **Source of Truth:** -- `core/ModelProfile.ts` — Interface definition -- `core/ModelDefaults.ts` — Default fallback values -- `core/BuiltInModelProfiles.ts` — Concrete model registrations +- `core/ModelProfile.ts` — Interface definition (includes `matryoshkaProtectedDim`) +- `core/ModelDefaults.ts` — Default derivation from seed values +- `core/BuiltInModelProfiles.ts` — Concrete model registrations (includes per-model `matryoshkaProtectedDim`) - `core/ModelProfileResolver.ts` — Runtime resolution -**Enforcement:** `npm run guard:model-derived` scans for violations before CI merge. +**Model-specific `matryoshkaProtectedDim` values (must be sourced from `BuiltInModelProfiles.ts`):** + +| Model | `matryoshkaProtectedDim` | Notes | +|-------|--------------------------|-------| +| `onnx-community/embeddinggemma-300m-ONNX` | 128 | Smallest supported Matryoshka sub-dimension | +| `nomic-ai/nomic-embed-text-v1.5` | 64 | To be added when nomic provider is wired | + +**Enforcement:** `npm run guard:model-derived` scans for violations before CI merge. The guard now checks for `matryoshkaProtectedDim` in addition to the standard embedding dimension and context length fields. ## Policy-Derived Constants diff --git a/PLAN.md b/PLAN.md index 245f86f..17657cf 100644 --- a/PLAN.md +++ b/PLAN.md @@ -77,15 +77,15 @@ This document tracks the implementation status of each major module in CORTEX. I | Module | Status | Files | Notes | |--------|--------|-------|-------| -| Text Chunking | ❌ Missing | `hippocampus/Chunker.ts` (planned) | Token-aware page boundary detection respecting ModelProfile limits | -| Page ID Generation | ❌ Missing | `hippocampus/PageIdGenerator.ts` (planned) | Deterministic hash-based ID creation | -| Ingest Orchestrator | ❌ Missing | `hippocampus/Ingest.ts` (planned) | Main entry point: chunk → embed → persist → initialise PageActivity → build hierarchy → fast neighbor insert → hotpath admission | +| Text Chunking | ✅ Complete | `hippocampus/Chunker.ts` | Token-aware sentence-boundary splitting respecting `ModelProfile.maxChunkTokens`; covered by `tests/hippocampus/Chunker.test.ts` | +| Page Builder | ✅ Complete | `hippocampus/PageBuilder.ts` | Builds signed `Page` entities with `contentHash`, `vectorHash`, `prevPageId`/`nextPageId` linkage; covered by `tests/hippocampus/PageBuilder.test.ts` | +| Ingest Orchestrator | 🟡 Partial | `hippocampus/Ingest.ts` | `ingestText()` implemented: chunk → embed → persist pages + PageActivity → create Book → run hotpath promotion sweep. **Missing:** hierarchy building (Volume/Shelf), semantic neighbor insertion. | | Hierarchy Builder | ❌ Missing | `hippocampus/HierarchyBuilder.ts` (planned) | Construct/update Books, Volumes, Shelves; attempt tier-quota hotpath admission for each level's medoid/prototype; Williams-derived fanout bounds; trigger split via ClusterStability when bounds exceeded | -| Fast Semantic Neighbor Insert | ❌ Missing | `hippocampus/FastNeighborInsert.ts` (planned) | Incremental semantic neighbor graph update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-weight neighbor on degree overflow; check new page for hotpath admission. **Note:** Not to be confused with Metroid construction, which is a CORTEX retrieval concern. | +| Fast Semantic Neighbor Insert | ❌ Missing | `hippocampus/FastNeighborInsert.ts` (planned) | Incremental semantic neighbor graph update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-cosine-similarity neighbor on degree overflow; check new page for hotpath admission. **Note:** Not to be confused with Metroid construction, which is a CORTEX retrieval concern. | -**Hippocampus Status:** 0/5 complete (0%) +**Hippocampus Status:** 2.5/5 complete (50%) -**Critical Blocker:** Without this, users cannot ingest text into the memory system. +**Critical Blocker:** Hierarchy builder and semantic neighbor insertion missing; ingest produces no graph structure beyond a single Book. --- @@ -100,12 +100,12 @@ This document tracks the implementation status of each major module in CORTEX. I | Seed Selection | ❌ Missing | `cortex/SeedSelection.ts` (planned) | Threshold-based top-k page selection from ranking output | | Subgraph Expansion | 🟡 Partial | `storage/IndexedDbMetadataStore.ts` (`getInducedMetroidSubgraph` — to be renamed `getInducedNeighborSubgraph`) | BFS expansion implemented in storage layer; needs dynamic Williams bounds; needs orchestration wrapper | | Open TSP Solver | ❌ Missing | `cortex/OpenTSPSolver.ts` (planned) | Dummy-node open-path heuristic for coherent ordering | -| Query Orchestrator | ❌ Missing | `cortex/Query.ts` (planned) | Main entry point: embed → select m1 → build Metroid → dialectical scoring → subgraph expansion → TSP path → query cost meter → early-stop; return result | -| Result DTO | ❌ Missing | `cortex/QueryResult.ts` (planned) | Structured query result with provenance metadata (coherence path, subgraph size, hop count, edge weights, knowledge gap flag) | +| Query Orchestrator | 🟡 Partial | `cortex/Query.ts` | `query()` implemented: hotpath-first scoring → warm/cold spill → PageActivity update → promotion sweep. **Missing:** MetroidBuilder, dialectical zone scoring, subgraph expansion, TSP coherence, query cost meter. | +| Result DTO | 🟡 Partial | `cortex/QueryResult.ts` | Minimal DTO: `pages`, `scores`, `metadata`. **Missing:** `coherencePath`, `metroid`, `knowledgeGap`, `provenance` fields. | -**Cortex Status:** 0.5/9 complete (6%) +**Cortex Status:** 1.5/9 complete (17%) -**Critical Blocker:** Without this, users cannot retrieve memories from the system. The MetroidBuilder, dialectical search pipeline, and knowledge gap detector are entirely absent. +**Critical Blocker:** MetroidBuilder, dialectical search pipeline, and knowledge gap detector entirely absent. Existing `Query.ts` implements flat top-K retrieval only. --- @@ -194,15 +194,15 @@ This document tracks the implementation status of each major module in CORTEX. I | Storage | 100% | — | | Vector Compute | 100% | — | | Embedding | 83% | WebGL provider (low priority) | -| Hippocampus | 0% | **CRITICAL** — No ingest path | -| Cortex | 6% | **CRITICAL** — No retrieval path; MetroidBuilder, dialectical search, knowledge gap detection all missing | +| Hippocampus | 50% | Chunker + PageBuilder + minimal Ingest done; hierarchy builder and semantic neighbor insertion missing | +| Cortex | 17% | Minimal Query + QueryResult done; MetroidBuilder, dialectical search, knowledge gap detection all missing | | Daydreamer | 0% | Not v1 blocker | | Policy | 100% | — | | Runtime | 100% | — | | Testing | 67% | Integration tests, scaling benchmarks | | Build/CI | 83% | — | -**System-Wide Completion:** ~70% (core infrastructure and policy foundation complete; ingest/query/benchmarks remain.) +**System-Wide Completion:** ~75% (core infrastructure, policy foundation, chunking, page building, and minimal ingest/query implemented; hierarchy builder, MetroidBuilder, and graph coherence remain.) --- @@ -211,17 +211,21 @@ This document tracks the implementation status of each major module in CORTEX. I - ✅ Store/retrieve vectors and metadata - ✅ Vector similarity operations on all backends - ✅ Generate real embeddings via Transformers.js -- ✅ Resolve model profiles and derive routing policies +- ✅ Resolve model profiles and derive routing policies (including `matryoshkaProtectedDim` for Matryoshka models) - ✅ Run browser/Electron runtime harness - ✅ Pass 115 unit tests - ✅ Hash text/binary content (SHA-256) and sign/verify Ed25519 signatures +- ✅ Chunk text and build signed `Page` entities +- ✅ Ingest text (minimal): chunk → embed → persist pages + PageActivity → create Book → hotpath promotion ## What Doesn't Work Today -- ❌ **Cannot ingest text** — No chunking or hierarchy builder -- ❌ **Cannot query memories** — No ranking pipeline or TSP solver +- ❌ **No hierarchy beyond single Book** — Volume/Shelf hierarchy builder not yet implemented +- ❌ **No semantic neighbor graph** — `FastNeighborInsert` not yet implemented; subgraph expansion has no edges +- ❌ **No dialectical retrieval** — `MetroidBuilder`, `KnowledgeGapDetector`, and dialectical pipeline not yet implemented; current `Query.ts` is flat top-K retrieval only +- ❌ **No coherent path ordering** — No TSP solver; results are ranked list, not narrative chain - ❌ **Cannot consolidate** — No Daydreamer loop -- ❌ **Cannot share discovery updates safely** — No privacy-filtered interest-subgraph exchange path +- ❌ **Cannot share discovery updates safely** — No P2P curiosity broadcasting or privacy-filtered exchange --- @@ -236,33 +240,25 @@ This document tracks the implementation status of each major module in CORTEX. I - Ed25519 signing/verification - 26 tests passing -2. **Williams Bound Policy Foundation** - - `core/HotpathPolicy.ts` — `computeCapacity`, `computeSalience`, `deriveTierQuotas`, `deriveCommunityQuotas`; all constants as frozen default policy object - - `core/SalienceEngine.ts` — `computeNodeSalience`, `batchComputeSalience`, `shouldPromote`, `selectEvictionTarget`; bootstrap and steady-state lifecycle - - Extend `core/types.ts` — `PageActivity`, `HotpathEntry`, `TierQuotas`, `MetadataStore` hotpath method signatures - - Extend `storage/IndexedDbMetadataStore.ts` — `hotpath_index` and `page_activity` object stores; implement new `MetadataStore` hotpath methods - - Tests: `tests/HotpathPolicy.test.ts`, `tests/SalienceEngine.test.ts`, extend `tests/Persistence.test.ts` - -3. **Text Chunking** (`hippocampus/Chunker.ts`) - - Token-aware splitting respecting ModelProfile limits - - Preserve sentence boundaries where possible - - Test with various text lengths - -4. **Hippocampus Ingest** (`hippocampus/Ingest.ts`) - - Chunk → Embed → Persist orchestration - - Build Page entities with proper hashing/signing; initialise `PageActivity` record - - Single-Book hierarchy (defer Volume/Shelf) - - Basic semantic neighbor insertion with Williams-bounded degree - -5. **Cortex Query** (`cortex/Query.ts`) - - Embed query - - Flat page ranking against resident hotpath (skip full hierarchy for now) - - Return top-K pages by similarity - - Skip TSP coherence path (just ranked list) - -6. **Integration Test** (`tests/integration/IngestQuery.test.ts`) - - Ingest text → Retrieve by query → Validate results - - Persistence across sessions +2. **Williams Bound Policy Foundation** ✅ **Complete** + - `core/HotpathPolicy.ts`, `core/SalienceEngine.ts`, `core/types.ts` extensions, `storage/IndexedDbMetadataStore.ts` hotpath stores + +3. **Text Chunking** (`hippocampus/Chunker.ts`) ✅ **Complete** + - Token-aware sentence-boundary splitting; tests passing + +4. **Page Builder** (`hippocampus/PageBuilder.ts`) ✅ **Complete** + - Signed Page entities with hash linkage; tests passing + +5. **Hippocampus Ingest** (`hippocampus/Ingest.ts`) 🟡 **Partial** + - Minimal `ingestText()` implemented (chunk → embed → persist pages → single Book → hotpath admission) + - **Remaining:** semantic neighbor insertion (deferred to Phase 2) + +6. **Cortex Query** (`cortex/Query.ts`) 🟡 **Partial** + - Minimal `query()` implemented (hotpath-first flat scoring; warm/cold spill) + - **Remaining:** MetroidBuilder, dialectical pipeline (deferred to Phase 2) + +7. **Integration Test** (`tests/integration/IngestQuery.test.ts`) ✅ **Complete** + - Ingest text → Retrieve by query → Validate results; persistence across sessions **Exit Criteria:** User can ingest text and retrieve relevant pages by query; Williams Bound policy is in place. @@ -392,21 +388,21 @@ This document tracks the implementation status of each major module in CORTEX. I ## Known Blockers & Risks -### Blocker 1: No Ingest Orchestration -**Impact:** Cannot use the system at all. -**Mitigation:** Phase 1 priority; single-book hierarchy sufficient for v0.1. +### Blocker 1: No Hierarchy Builder or Semantic Neighbor Graph +**Impact:** Ingest produces only a single flat Book; no Volume/Shelf structure; subgraph expansion has no edges to traverse. +**Mitigation:** Phase 2 priority; `HierarchyBuilder` and `FastNeighborInsert` must be implemented before dialectical retrieval is possible. -### Blocker 2: No Query Orchestration -**Impact:** Cannot retrieve memories. -**Mitigation:** Phase 1 priority; flat ranking against resident hotpath acceptable for v0.1. +### Blocker 2: No MetroidBuilder or Dialectical Pipeline +**Impact:** Queries return flat top-K results only; no epistemic balance, no knowledge gap detection, no P2P curiosity. +**Mitigation:** Phase 2 priority; depends on semantic neighbor graph (Blocker 1) and hierarchy builder. -### Blocker 3: No HotpathPolicy or SalienceEngine -**Impact:** Cannot enforce Williams Bound invariants; all subsequent phases depend on these. -**Mitigation:** Phase 1 priority; implement before ingest/query orchestration. +### Blocker 3: No Privacy-Safe Sharing or Curiosity Broadcasting Pipeline +**Impact:** Core discovery-sharing value proposition is missing; knowledge gaps cannot be resolved via P2P. +**Mitigation:** Phase 3 required track; implement eligibility classifier + curiosity broadcaster + signed subgraph exchange as v1 scope. CuriosityProbe must include `mimeType` and `modelUrn` to prevent incommensurable graph merges. -### Blocker 4: No Privacy-Safe Sharing Pipeline -**Impact:** Core discovery-sharing value proposition is missing. -**Mitigation:** Phase 3 required track; implement eligibility classifier + signed subgraph exchange as v1 scope. +### Blocker 4: Naming Drift (P0-X) +**Impact:** The term "Metroid" is currently used for the proximity graph in all code. MetroidBuilder cannot be introduced without a rename collision. +**Mitigation:** P0-X tasks (rename `MetroidNeighbor` → `SemanticNeighbor`, etc.) must be completed before MetroidBuilder is implemented. ### Risk 1: TSP Complexity Open TSP is NP-hard; heuristic may be slow on large subgraphs. diff --git a/README.md b/README.md index 4c5025e..ed31306 100644 --- a/README.md +++ b/README.md @@ -64,7 +64,7 @@ This is the rapid, multi-path "write" system that turns raw experience into stru ### 🧩 Cortex — Intelligent Routing & Coherence When you ask a question, Cortex does **not** return a bag of similar vectors. -Instead it: +**Planned target behavior (v0.5+):** - Constructs a **Metroid** `{ m1, m2, c }` for the query — a structured dialectical search probe pairing the thesis medoid (m1) with an antithesis medoid (m2) and a balanced centroid (c) - Performs Matryoshka dimensional unwinding to discover semantically opposing knowledge - Performs parallel WebGPU "scoops" across the entire active universe (sub-millisecond) @@ -72,9 +72,13 @@ Instead it: - Traces closed-loop paths through Hebbian connections - Returns only self-consistent, coherent context chains - Detects **knowledge gaps** when no antithesis medoid exists within dimensional constraints -- Broadcasts P2P curiosity probes to discover missing knowledge from peers +- Broadcasts P2P curiosity probes (with `mimeType` + `modelUrn` for commensurability) to discover missing knowledge from peers -The result feels like genuine recollection rather than search — and surfaces what you *don't* know as clearly as what you do. +**Current behavior (v0.1):** +- Flat top-K similarity scoring against the hotpath resident index with warm/cold spill +- No MetroidBuilder, no dialectical pipeline, no knowledge gap detection yet + +The result of the full v0.5 system will feel like genuine recollection rather than search — and will surface what you *don't* know as clearly as what you do. ### 🌙 Daydreamer — The Default Mode Network When the agent is idle, a throttled Web Worker takes over: diff --git a/TODO.md b/TODO.md index be5de12..c5395e6 100644 --- a/TODO.md +++ b/TODO.md @@ -357,10 +357,11 @@ These items add hierarchical routing and coherent path ordering. They transform - [ ] **P1-M1:** Implement `cortex/MetroidBuilder.ts` - Accept a query embedding and a list of resident medoids (shelf/volume/book representatives) - Select m1: the medoid with highest cosine similarity to the query - - Freeze the protected lower Matryoshka dimensions (dimension count derived from ModelProfile; see `embeddingDimension` and `matryoshkaProtectedDim`) - - In the unfrozen upper dimensions, search for the nearest medoid with **opposing** semantic direction (minimum cosine similarity above a negative threshold, or maximum angular distance) + - Read `matryoshkaProtectedDim` from `ModelProfile` (the field added to `core/ModelProfile.ts` as the per-model protected floor — e.g. 128 for embeddinggemma-300m, 64 for nomic-embed-text-v1.5). If `undefined` on the current model, return `{ m1, m2: null, c: null, knowledgeGap: true }` immediately. + - Freeze all dimensions with index < `matryoshkaProtectedDim` + - In the unfrozen upper dimensions (index >= `matryoshkaProtectedDim`), search for the nearest medoid with **opposing** semantic direction (minimum cosine similarity above a negative threshold, or maximum angular distance) - This medoid becomes m2 (antithesis) - - Compute centroid: `c = (m1_vec + m2_vec) / 2` + - Compute centroid: protected dims (< matryoshkaProtectedDim) copied from m1 vector; unfrozen dims averaged element-wise: `c[i] = (m1[i] + m2[i]) / 2` - Return `Metroid { m1, m2, c }`; if no valid m2 found, return `{ m1, m2: null, c: null, knowledgeGap: true }` - [ ] **P1-M2:** Implement Matryoshka dimensional unwinding in `cortex/MetroidBuilder.ts` @@ -393,7 +394,9 @@ These items add hierarchical routing and coherent path ordering. They transform - This DTO is returned to the caller as part of `QueryResult` - [ ] **P1-N2:** Implement curiosity probe construction in `cortex/KnowledgeGapDetector.ts` - - Build `CuriosityProbe { m1, partialMetroid, queryContext, knowledgeBoundary }` + - Build `CuriosityProbe { m1, partialMetroid, queryContext, knowledgeBoundary, mimeType, modelUrn }` + - `mimeType`: MIME type of embedded content (e.g. `text/plain`). Enables receiving peers to validate content-type compatibility before comparing graph sections. + - `modelUrn`: URN of the embedding model (e.g. `urn:model:onnx-community/embeddinggemma-300m-ONNX:v1`) sourced from the active `ModelProfile.modelId`. Peers **must** reject probes whose `modelUrn` does not match a model they support — accepting fragments from a different embedding model would produce incommensurable similarity scores at Matryoshka layer boundaries. - Store probe locally for broadcast via P2P layer (see P2-G) - Do not broadcast immediately — queue for the P2P sharing layer @@ -404,7 +407,8 @@ These items add hierarchical routing and coherent path ordering. They transform - [ ] **P1-N4:** Add knowledge gap test coverage - `tests/cortex/KnowledgeGapDetector.test.ts` - Test that a KnowledgeGap DTO is produced when MetroidBuilder returns `knowledgeGap: true` - - Test that a CuriosityProbe is constructed with correct fields + - Test that a CuriosityProbe is constructed with correct fields including `mimeType` and `modelUrn` + - Test that `modelUrn` is derived from `ModelProfile.modelId` (not hardcoded) - Test that QueryResult includes the KnowledgeGap when present - Test that queries against a rich corpus do NOT produce false-positive knowledge gaps diff --git a/core/BuiltInModelProfiles.ts b/core/BuiltInModelProfiles.ts index ec814ad..caca66f 100644 --- a/core/BuiltInModelProfiles.ts +++ b/core/BuiltInModelProfiles.ts @@ -22,6 +22,10 @@ import type { ModelProfileRegistryEntry } from "./ModelProfileResolver"; * The default dimension registered here (768) is the full-fidelity output. * Callers may slice to a smaller sub-dimension for compressed retrieval tiers. * + * matryoshkaProtectedDim = 128: the most coarse-grained (smallest) sub-dimension + * officially supported by the model. MetroidBuilder uses this as the protected + * floor — dimensions below 128 are not a supported embedding granularity. + * * Task prompts (required for best retrieval quality): * Query prefix: "query: " * Document prefix: "passage: " @@ -35,11 +39,17 @@ export const EMBEDDING_GEMMA_300M_MODEL_ID = export const EMBEDDING_GEMMA_300M_PROFILE: ModelProfileRegistryEntry = { embeddingDimension: 768, contextWindowTokens: 512, + matryoshkaProtectedDim: 128, }; /** * Canonical registry of all built-in model profiles, keyed by model ID. * This record is used as the default registry in `ModelProfileResolver`. + * + * When adding a new Matryoshka embedding model, set `matryoshkaProtectedDim` + * to the smallest sub-dimension the model officially supports. Known values: + * - embeddinggemma-300m: 128 + * - nomic-embed-text-v1.5: 64 (to be added when nomic provider is wired) */ export const BUILT_IN_MODEL_REGISTRY: Record = Object.freeze({ diff --git a/core/ModelDefaults.ts b/core/ModelDefaults.ts index 255064a..0f629db 100644 --- a/core/ModelDefaults.ts +++ b/core/ModelDefaults.ts @@ -77,6 +77,15 @@ export function buildModelProfileFromSeed( assertPositiveInteger("embeddingDimension", seed.embeddingDimension); assertPositiveInteger("contextWindowTokens", seed.contextWindowTokens); + if (seed.matryoshkaProtectedDim !== undefined) { + assertPositiveInteger("matryoshkaProtectedDim", seed.matryoshkaProtectedDim); + if (seed.matryoshkaProtectedDim > seed.embeddingDimension) { + throw new Error( + "matryoshkaProtectedDim cannot exceed embeddingDimension", + ); + } + } + return { modelId, embeddingDimension: seed.embeddingDimension, @@ -84,5 +93,8 @@ export function buildModelProfileFromSeed( truncationTokens: deriveTruncationTokens(seed.contextWindowTokens, policy), maxChunkTokens: deriveChunkTokenLimit(seed.contextWindowTokens, policy), source: seed.source, + ...(seed.matryoshkaProtectedDim !== undefined + ? { matryoshkaProtectedDim: seed.matryoshkaProtectedDim } + : {}), }; } diff --git a/core/ModelProfile.ts b/core/ModelProfile.ts index f641366..baa4a05 100644 --- a/core/ModelProfile.ts +++ b/core/ModelProfile.ts @@ -5,6 +5,22 @@ export interface ModelProfileSeed { embeddingDimension: number; contextWindowTokens: number; source: ModelProfileSource; + /** + * The most coarse-grained Matryoshka sub-dimension for this model. + * + * This is the smallest nested embedding size the model officially supports. + * It defines the "protected floor" used by MetroidBuilder: lower dimensions + * encode invariant domain context and are never searched for antithesis. + * + * Known values: + * - embeddinggemma-300m: 128 + * - nomic-embed-text-v1.5: 64 + * + * `undefined` for models that do not use Matryoshka Representation Learning. + * When undefined, MetroidBuilder cannot perform dimensional unwinding and will + * always declare a knowledge gap (antithesis search is not possible). + */ + matryoshkaProtectedDim?: number; } export interface PartialModelMetadata { @@ -19,4 +35,22 @@ export interface ModelProfile { truncationTokens: number; maxChunkTokens: number; source: ModelProfileSource; + /** + * The most coarse-grained Matryoshka sub-dimension for this model. + * + * This is the smallest nested embedding size the model officially supports. + * It defines the "protected floor" used by MetroidBuilder: dimensions below + * this boundary encode invariant domain context and are never searched for + * antithesis during Matryoshka dimensional unwinding. + * + * Known values: + * - embeddinggemma-300m: 128 + * - nomic-embed-text-v1.5: 64 + * + * `undefined` for models that do not use Matryoshka Representation Learning. + * When undefined, MetroidBuilder cannot perform dimensional unwinding and will + * always declare a knowledge gap (antithesis search is not possible without + * a protected-dimension floor). + */ + matryoshkaProtectedDim?: number; } diff --git a/core/ModelProfileResolver.ts b/core/ModelProfileResolver.ts index 7a7a2d4..3cb9384 100644 --- a/core/ModelProfileResolver.ts +++ b/core/ModelProfileResolver.ts @@ -12,6 +12,11 @@ import type { export interface ModelProfileRegistryEntry { embeddingDimension: number; contextWindowTokens: number; + /** + * The most coarse-grained Matryoshka sub-dimension for this model. + * Required for MetroidBuilder dimensional unwinding. See `ModelProfile.matryoshkaProtectedDim`. + */ + matryoshkaProtectedDim?: number; } export interface ModelProfileResolverOptions { @@ -46,6 +51,9 @@ export class ModelProfileResolver { this.registry.set(normalizeModelId(modelId), { embeddingDimension: entry.embeddingDimension, contextWindowTokens: entry.contextWindowTokens, + ...(entry.matryoshkaProtectedDim !== undefined + ? { matryoshkaProtectedDim: entry.matryoshkaProtectedDim } + : {}), }); } @@ -78,6 +86,7 @@ export class ModelProfileResolver { embeddingDimension, contextWindowTokens, source, + matryoshkaProtectedDim: registryEntry?.matryoshkaProtectedDim, }, this.derivationPolicy, ); diff --git a/scripts/guard-model-derived.mjs b/scripts/guard-model-derived.mjs index 069d9b7..b12be7e 100644 --- a/scripts/guard-model-derived.mjs +++ b/scripts/guard-model-derived.mjs @@ -20,7 +20,7 @@ const ALLOWED_SOURCE_FILES = new Set([ ]); const MODEL_FIELD_PATTERN = - /\b(embeddingDim(?:ension)?|contextWindowTokens|maxInputTokens|maxChunkTokens|truncationTokens|maxTokens)\b/i; + /\b(embeddingDim(?:ension)?|contextWindowTokens|maxInputTokens|maxChunkTokens|truncationTokens|maxTokens|matryoshkaProtectedDim)\b/i; const ASSIGNMENT_PATTERN = /[:=]/; const NUMERIC_LITERAL_PATTERN = /(^|[^\w.])-?\d+(?:\.\d+)?([^\w.]|$)/; From be349866485e52c06d7fac92e138d3486c87e417 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 19:45:13 +0000 Subject: [PATCH 4/6] docs: clarify FastNeighborInsert algorithm, edge-role invariants, and Query.ts rework scope PLAN.md: - FastNeighborInsert row: use Williams-cutoff distance (not K), lazy Daydreamer reconnection, cosine=discovery+Bayesian vs Hebbian=TSP traversal; DESIGN.md cross-ref - Query.ts/QueryResult.ts: status "Needs Rework"; existing flat top-K code is superseded TODO.md: - P1-C1: neighbors found within Williams-cutoff distance; Daydreamer builds additional edges lazily; edge-role invariant (SemanticNeighbor.cosineSimilarity vs edges_hebbian) - P1-C3: add test that FastNeighborInsert does NOT create Hebbian edges - P1-E: "Rewrite" not "Upgrade"; note flat-scoring code path is fully superseded; add Hebbian edge traversal in P1-E1; recommended order updated DESIGN.md: - SemanticNeighbor: add edge-role distinction table; remove misleading "TSP-ready" comment - SemanticNeighborSubgraph: inline note that TSP uses Hebbian weights for tour traversal - Incremental Strategy: Williams-cutoff distance; Daydreamer lazy reconnection Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- DESIGN.md | 26 ++++++++++++++++++++++++-- PLAN.md | 6 +++--- TODO.md | 24 ++++++++++++++---------- 3 files changed, 41 insertions(+), 15 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index d63bd21..d2ea1ac 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -434,11 +434,28 @@ Sparse radius-graph edge connecting pages with high cosine similarity. Used for > **Note:** The current codebase names this type `MetroidNeighbor` — this is an architectural naming error introduced by early conceptual drift. The correct term is `SemanticNeighbor` (or equivalent). A code-level rename is tracked in the TODO. The edge is a proximity concept, not a Metroid concept. +**Critical distinction — two edge types, two roles:** + +| Edge type | Storage | Role | +|-----------|---------|------| +| `SemanticNeighbor` | `neighbor_graph` IDB store | Neighbor discovery during ingest; Bayesian belief updates | +| Hebbian edge (`Edge`) | `edges_hebbian` IDB store | TSP tour traversal distance; LTP/LTD strengthening/decay | + +`SemanticNeighbor.cosineSimilarity` drives: +- Which pages become neighbors during `FastNeighborInsert` (Williams-cutoff distance, not a fixed K) +- Bayesian belief score updates for the retrieved page set + +Hebbian `Edge.weight` drives: +- The distance metric used by `OpenTSPSolver` when ordering pages into a coherent narrative path +- Strength of connection for LTP/LTD during Daydreamer consolidation + +These two edge types must **never** be conflated or substituted for one another. + ```typescript interface SemanticNeighbor { neighborPageId: Hash; cosineSimilarity: number; - distance: number; // 1 - cosineSimilarity (TSP-ready) + distance: number; // 1 - cosineSimilarity; used for subgraph edge weight } ``` @@ -450,6 +467,9 @@ Induced subgraph for BFS-based coherence path expansion. ```typescript interface SemanticNeighborSubgraph { nodes: Hash[]; + // distance: 1 - cosineSimilarity; used for BFS expansion candidate selection. + // OpenTSPSolver uses Hebbian edge weights (from edges_hebbian) as the tour + // traversal distance to determine how far to walk — not these cosine distances. edges: { from: Hash; to: Hash; distance: number }[]; } ``` @@ -556,7 +576,9 @@ Rather than returning nearest neighbors by similarity, Cortex traces a coherent 7. **Mark Dirty** — Flag volumes for full recalc by Daydreamer **Incremental Strategy:** -Fast local semantic neighbor insertion keeps query-time latency low. Full neighborhood recalculation is deferred to idle Daydreamer passes. Hotpath admission runs at ingest time for new pages and hierarchy prototypes. +Fast local semantic neighbor insertion keeps ingest-time latency low. At ingest time, only the initial forward and reverse edges are created — neighbors are selected by cosine similarity within Williams-cutoff **distance** (not a fixed K; the cutoff is derived from `HotpathPolicy`). On degree overflow, the lowest-cosine-similarity neighbor is evicted. + +Full cross-edge reconnection is intentionally deferred: Daydreamer walks the graph during idle passes to build additional edges, strengthening or pruning connections via LTP/LTD. This avoids a full graph recalculation on every insert while still converging to a well-connected graph over time. Hotpath admission runs at ingest time for new pages and hierarchy prototypes. ## Consolidation Design diff --git a/PLAN.md b/PLAN.md index 17657cf..0d8c942 100644 --- a/PLAN.md +++ b/PLAN.md @@ -81,7 +81,7 @@ This document tracks the implementation status of each major module in CORTEX. I | Page Builder | ✅ Complete | `hippocampus/PageBuilder.ts` | Builds signed `Page` entities with `contentHash`, `vectorHash`, `prevPageId`/`nextPageId` linkage; covered by `tests/hippocampus/PageBuilder.test.ts` | | Ingest Orchestrator | 🟡 Partial | `hippocampus/Ingest.ts` | `ingestText()` implemented: chunk → embed → persist pages + PageActivity → create Book → run hotpath promotion sweep. **Missing:** hierarchy building (Volume/Shelf), semantic neighbor insertion. | | Hierarchy Builder | ❌ Missing | `hippocampus/HierarchyBuilder.ts` (planned) | Construct/update Books, Volumes, Shelves; attempt tier-quota hotpath admission for each level's medoid/prototype; Williams-derived fanout bounds; trigger split via ClusterStability when bounds exceeded | -| Fast Semantic Neighbor Insert | ❌ Missing | `hippocampus/FastNeighborInsert.ts` (planned) | Incremental semantic neighbor graph update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-cosine-similarity neighbor on degree overflow; check new page for hotpath admission. **Note:** Not to be confused with Metroid construction, which is a CORTEX retrieval concern. | +| Fast Semantic Neighbor Insert | ❌ Missing | `hippocampus/FastNeighborInsert.ts` (planned) | Cosine-nearest neighbors within Williams-cutoff distance (not fixed K). Degree overflow evicts lowest-cosine-similarity neighbor. Initial edges only at ingest; Daydreamer builds additional edges lazily. `SemanticNeighbor.cosineSimilarity` drives discovery + Bayesian updates; Hebbian weights (separate) drive TSP traversal. See DESIGN.md §Graph Structures for the full edge-role invariant. | **Hippocampus Status:** 2.5/5 complete (50%) @@ -100,8 +100,8 @@ This document tracks the implementation status of each major module in CORTEX. I | Seed Selection | ❌ Missing | `cortex/SeedSelection.ts` (planned) | Threshold-based top-k page selection from ranking output | | Subgraph Expansion | 🟡 Partial | `storage/IndexedDbMetadataStore.ts` (`getInducedMetroidSubgraph` — to be renamed `getInducedNeighborSubgraph`) | BFS expansion implemented in storage layer; needs dynamic Williams bounds; needs orchestration wrapper | | Open TSP Solver | ❌ Missing | `cortex/OpenTSPSolver.ts` (planned) | Dummy-node open-path heuristic for coherent ordering | -| Query Orchestrator | 🟡 Partial | `cortex/Query.ts` | `query()` implemented: hotpath-first scoring → warm/cold spill → PageActivity update → promotion sweep. **Missing:** MetroidBuilder, dialectical zone scoring, subgraph expansion, TSP coherence, query cost meter. | -| Result DTO | 🟡 Partial | `cortex/QueryResult.ts` | Minimal DTO: `pages`, `scores`, `metadata`. **Missing:** `coherencePath`, `metroid`, `knowledgeGap`, `provenance` fields. | +| Query Orchestrator | 🟡 Needs Rework | `cortex/Query.ts` | Flat top-K scoring implemented (hotpath-first → warm/cold spill → PageActivity update → promotion sweep). **Must be substantially reworked** to implement the full dialectical pipeline: replace flat scoring with hierarchical resident-first ranking, add MetroidBuilder, dialectical zone scoring (thesis/antithesis/synthesis), subgraph expansion with dynamic Williams bounds, TSP coherence path, and query cost meter. The existing implementation does not use Hebbian edges or cosine-similarity-bounded subgraph expansion; it is a functional placeholder only. | +| Result DTO | 🟡 Needs Rework | `cortex/QueryResult.ts` | Minimal DTO (`pages`, `scores`, `metadata`). **Must be reworked** to add `coherencePath: Hash[]`, `metroid?: { m1, m2, centroid }`, `knowledgeGap?: KnowledgeGap`, and `provenance: { subgraphSize, hopCount, edgeWeights, vectorOpCost, earlyStop }`. | **Cortex Status:** 1.5/9 complete (17%) diff --git a/TODO.md b/TODO.md index c5395e6..b8289f4 100644 --- a/TODO.md +++ b/TODO.md @@ -308,11 +308,12 @@ These items add hierarchical routing and coherent path ordering. They transform **Why:** Need a sparse semantic neighbor graph for coherent path tracing. This graph connects pages with high cosine similarity and is used for BFS subgraph expansion during retrieval. Degree must be bounded by `HotpathPolicy` to prevent unbounded graph mass growth. **This is not related to Metroid construction** — the semantic neighbor graph is a proximity concept, not a dialectical probe concept. - [ ] **P1-C1:** Implement `hippocampus/FastNeighborInsert.ts` - - For each new page, compute similarity to existing pages - - Derive max neighbors per page from `HotpathPolicy` constant (not hardcoded K) - - Insert forward edges (page → neighbors) as `SemanticNeighbor` records - - Insert reverse edges (neighbors → page), respecting max degree + - For each new page, find cosine-nearest neighbors within Williams-cutoff **distance** (not a fixed K); derive the cutoff radius from `HotpathPolicy` rather than a hardcoded constant + - Insert forward edges (page → neighbors) as `SemanticNeighbor` records, respecting max degree + - Insert reverse edges (neighbors → page), respecting max degree per direction - If a page is already at max degree, evict the neighbor with the lowest cosine similarity + - Insert only initial edges at ingest time; do not attempt full cross-edge reconnection — Daydreamer walks the graph during idle passes to build additional edges (avoids full graph recalc on every insert) + - **Edge role invariant:** `SemanticNeighbor.cosineSimilarity` is used for neighbor discovery and Bayesian belief updates. Hebbian edge weights (in `edges_hebbian`) are used for TSP tour traversal. These are separate edge types with separate roles; do not mix them. - Mark affected volumes as dirty for full Daydreamer recalc - After insertion, check new page for hotpath admission via `SalienceEngine` @@ -321,10 +322,11 @@ These items add hierarchical routing and coherent path ordering. They transform - [ ] **P1-C3:** Add semantic neighbor insert test coverage - `tests/hippocampus/FastNeighborInsert.test.ts` - - Test neighbor lists are bounded by the policy-derived max degree + - Test neighbor lists are bounded by Williams-cutoff distance (not a fixed K) - Test symmetry (if A→B, then B→A) - - Test that degree overflow evicts lowest-similarity neighbor, not a random one + - Test that degree overflow evicts lowest-cosine-similarity neighbor, not a random one - Test that new page is considered for hotpath admission after insertion + - Test that `edges_hebbian` records are NOT created by FastNeighborInsert (Hebbian is Daydreamer's concern) **Exit Criteria:** Semantic neighbor graph is maintained during ingest with policy-bounded degree. @@ -420,19 +422,21 @@ These items add hierarchical routing and coherent path ordering. They transform **Why:** This is the "aha" moment — return memories in natural narrative order through the resident hotpath via dialectical Metroid exploration, with dynamic, sublinear expansion bounds. -- [ ] **P1-E1:** Upgrade `cortex/Query.ts` (full version) +> **Note on scope:** The existing `cortex/Query.ts` is a flat top-K scorer that does not use MetroidBuilder, Hebbian edge traversal, or cosine-similarity-bounded subgraph expansion. It must be **substantially reworked** — not merely extended — to implement the dialectical pipeline described below. The same applies to `cortex/QueryResult.ts`. Do not attempt to preserve the flat-scoring code path; it is superseded entirely. + +- [ ] **P1-E1:** Rewrite `cortex/Query.ts` (full dialectical version) - Use resident-first hierarchical ranking to select topic medoid (m1) - Call `MetroidBuilder` to construct `{ m1, m2, c }` - If knowledge gap detected, include in result and continue with partial Metroid (m1 only) - Use centroid `c` as the primary scoring anchor for page selection - Derive dynamic subgraph bounds from `HotpathPolicy` (`maxSubgraphSize`, `maxHops`, `perHopBranching`) - - Call `MetadataStore.getInducedNeighborSubgraph(seedPages, maxHops)` using dynamic `maxHops` + - Call `MetadataStore.getInducedNeighborSubgraph(seedPages, maxHops)` using dynamic `maxHops`; traverse edges using Hebbian weights for tour distance (not cosine similarity) - Call `OpenTSPSolver.solve(subgraph)` - Return ordered page list via coherent path - **Query cost meter:** count vector operations; early-stop and return best-so-far if cost exceeds Williams-derived budget - Include provenance metadata (hop count, edge weights, subgraph size, cost, Metroid details) -- [ ] **P1-E2:** Upgrade `cortex/QueryResult.ts` +- [ ] **P1-E2:** Rewrite `cortex/QueryResult.ts` - Add `coherencePath: Hash[]` (ordered page IDs) - Add `metroid?: { m1: Hash; m2: Hash | null; centroid: Float32Array | null }` (Metroid used for this query) - Add `knowledgeGap?: KnowledgeGap` (if antithesis discovery failed) @@ -847,7 +851,7 @@ If you're reading this and want to know "what do I work on right now?", here's t 7. **P1-M1/M2:** Implement `cortex/MetroidBuilder.ts` with Matryoshka unwinding 8. **P1-N1/N2:** Implement `cortex/KnowledgeGapDetector.ts` 9. **P1-D1:** Implement `cortex/OpenTSPSolver.ts` -10. **P1-E1:** Upgrade `cortex/Query.ts` to full dialectical orchestrator +10. **P1-E1:** Rewrite `cortex/Query.ts` to full dialectical orchestrator (substantial rework; not backward-compatible with flat top-K version) --- From 622570b261ad8433752fb92416326bb824093aee Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 19:53:55 +0000 Subject: [PATCH 5/6] docs: enrich centroid as Kansas-space vantage, ARCH-REVIEW zero-drift caveat, README must-language MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DESIGN.md: - c definition: synthetic center of mass, typically empty "Kansas space", neutral vantage point for unbiased scoring; properly formatted continuation lines - Step 4: Kansas space explanation in short focused sentences - Step 5: rewritten as bulleted zone-classification list (thesis/antithesis/synthesis/ third-region); cross-ref to Dialectical Search; scoring-from-c rationale kept here - Dialectical Search: removed duplicate "key property" paragraph; merged anchoring-bias explanation into single scoring paragraph; zone table uses closer-to-c framing - Terminology Centroid: multiline, concise Kansas-space explanation ARCHITECTURE-REVIEW.md: - Query.ts / QueryResult.ts explicitly flagged "must be substantially rewritten" - Zero-drift caveat restructured as 4 bulleted points for clarity: what it means, what it does not mean, per-file impact, authoritative status pointer (PLAN.md) - Recommended Fix Order step 4: "Rewrite" not "Upgrade" README.md: - Cortex section: "Required behavior (v0.5+ engineering target)" with "must" language on every bullet; Kansas space note as sub-bullet of Metroid bullet - Current behavior relabeled "(v0.1 — placeholder)" Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- ARCHITECTURE-REVIEW.md | 13 +++++++++---- DESIGN.md | 38 ++++++++++++++++++++++++++++++-------- README.md | 27 ++++++++++++++------------- 3 files changed, 53 insertions(+), 25 deletions(-) diff --git a/ARCHITECTURE-REVIEW.md b/ARCHITECTURE-REVIEW.md index 42ab46d..c894929 100644 --- a/ARCHITECTURE-REVIEW.md +++ b/ARCHITECTURE-REVIEW.md @@ -255,11 +255,16 @@ The following components are correctly implemented (or partially implemented in - `hippocampus/Chunker.ts` — Text chunking; **implemented and correct** - `hippocampus/PageBuilder.ts` — Page entity construction; **implemented and correct** - `hippocampus/Ingest.ts` — Minimal ingest path; **partially implemented** (chunk→embed→persist→Book→hotpath); correct direction, hierarchy and neighbor insertion deferred -- `cortex/Query.ts` — Minimal query path; **partially implemented** (hotpath-first flat scoring); correct direction, MetroidBuilder deferred -- `cortex/QueryResult.ts` — Minimal result DTO; **partially implemented**; correct direction, provenance fields deferred +- `cortex/Query.ts` — Minimal query path; **partially implemented** (hotpath-first flat scoring); **must be substantially rewritten** for the dialectical pipeline (P1-E) +- `cortex/QueryResult.ts` — Minimal result DTO; **partially implemented**; **must be rewritten** to add coherencePath, metroid, knowledgeGap, provenance fields (P1-E2) - All `VectorBackend` implementations — correct -> **Note:** PLAN.md v1.2 has been updated to reflect the actual implementation status of all Hippocampus and Cortex modules. The initial v1.1 plan incorrectly marked `Chunker.ts`, `PageBuilder.ts`, `Ingest.ts`, `Query.ts`, and `QueryResult.ts` as missing; this has been corrected. +> **Important caveat on "zero drift":** +> +> - **What it means:** No architectural logic in these files conflicts with the corrected design. They do not need to be deleted or redesigned from scratch. +> - **What it does not mean:** Unaffected by future work. The "roughed in" implementations (`Ingest.ts`, `Query.ts`, `QueryResult.ts`) were scaffolded before the MetroidBuilder design was fully specified. +> - **Impact:** `Query.ts` and `QueryResult.ts` must be substantially rewritten (P1-E); `Ingest.ts` must gain hierarchy building and neighbor insertion (P1-B, P1-C). Each is a correct stub in the right direction, but not a complete implementation. +> - **Authoritative status:** Refer to **PLAN.md**, not this section, when assessing whether a file needs additional work. --- @@ -268,5 +273,5 @@ The following components are correctly implemented (or partially implemented in 1. **P0-X1–X7** — Fix naming drift in `core/types.ts`, `storage/IndexedDbMetadataStore.ts`, `cortex/Query.ts`, and planned file names. This unblocks MetroidBuilder without risking collision. 2. **P1-M1–M3** — Add `Metroid` and `KnowledgeGap` types; implement `MetroidBuilder`. 3. **P1-N1–N4** — Implement `KnowledgeGapDetector`. -4. **P1-E1–E3** — Upgrade `cortex/Query.ts` to full dialectical orchestrator. +4. **P1-E1–E3** — Rewrite `cortex/Query.ts` to full dialectical orchestrator (not backward-compatible with existing flat top-K code). 5. **P1-C1–C3** — Implement `FastNeighborInsert` (correctly named after P0-X). diff --git a/DESIGN.md b/DESIGN.md index d2ea1ac..9438c74 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -104,7 +104,10 @@ Metroid = { m1, m2, c } Where: - **m1** — thesis medoid: the cluster representative most relevant to the query topic - **m2** — antithesis medoid: a cluster representative discovered through constrained Matryoshka search to represent semantic opposition to m1 -- **c** — centroid: the geometric midpoint between m1 and m2, used as the balanced search origin +- **c** — centroid: the synthetic center of mass between m1 and m2. + `c` is a "Kansas space" position — typically empty; no real node lives at the centroid. + Its value is as a neutral vantage point: from `c`, distances to both poles and all + candidates can be measured without anchoring bias toward either m1 or m2. The Metroid is constructed at query time by the `MetroidBuilder`. It is **not** a persistent graph structure. It is a transient epistemological instrument. @@ -115,11 +118,23 @@ The Metroid is constructed at query time by the `MetroidBuilder`. It is **not** 1. **Select m1** — Identify the topic medoid most relevant to the query embedding. 2. **Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis. 3. **Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1. -4. **Compute centroid** — Compute `c` as follows: +4. **Compute centroid** — Compute `c` as a center of mass between m1 and m2: - Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These dimensions are invariant; averaging them would dilute the domain anchor that makes the antithesis search meaningful. - Unfrozen dimensions (index >= `matryoshkaProtectedDim`): compute the element-wise average of m1 and m2 — `c[i] = (m1[i] + m2[i]) / 2`. - The result is a full-dimensional vector that can be used directly as a scoring anchor. -5. **Prefer centroid as search origin** — Use `c` as the primary starting point for subgraph expansion. This prevents semantic drift toward either pole. + + **Important:** `c` is a synthetic position — a "Kansas space". In most cases nothing actually + exists at the centroid; it is an empty field in embedding space, equidistant from both poles. + Its value is as a neutral vantage point. Standing at `c`, you can immediately measure whether + any candidate is closer to m1 (thesis), closer to m2 (antithesis), or equidistant from both + (genuinely synthetic). Scoring by proximity to `c` produces unbiased, balanced retrieval. + Scoring from m1 or m2 would pull all results toward one pole. +5. **Use centroid as scoring vantage point** — Weight candidates by their distance to `c`, not to m1 or m2. + - Near `c`: synthesis territory — balanced between both poles. + - Much closer to m1 than to `c`: thesis-supporting. + - Much closer to m2 than to `c`: antithesis-supporting. + - Far from `c`, m1, and m2 simultaneously: a third conceptual region not captured by either pole — signal for further Matryoshka unwinding or a knowledge gap. + Scoring from `c` avoids anchoring bias; see the Dialectical Search section for the full zone model. 6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search. 7. **Stop at the protected dimension** — The protected lower dimensions are never unwound. This preserves semantic invariants throughout all levels of search. @@ -148,13 +163,15 @@ This produces progressively wider dialectical exploration while maintaining sema ### Dialectical Search -Every Metroid-driven query explores three zones: +Every Metroid-driven query explores three zones, with all scoring anchored at the centroid `c`: | Zone | Pole | Meaning | |------|------|---------| -| Thesis zone | around m1 | Supporting ideas, corroborating evidence | -| Antithesis zone | around m2 | Opposing ideas, counterevidence, alternative perspectives | -| Synthesis zone | around c | Conceptually balanced territory between both poles | +| Thesis zone | closer to m1 than to c | Supporting ideas, corroborating evidence | +| Antithesis zone | closer to m2 than to c | Opposing ideas, counterevidence, alternative perspectives | +| Synthesis zone | near c, equidistant from m1 and m2 | Conceptually balanced territory between both poles | + +**Scoring from the centroid vantage point:** candidates are ranked by their distance to `c`. A candidate significantly closer to m1 than to `c` is thesis-supporting; significantly closer to m2 is antithesis-supporting; near `c` is synthesis-zone content. Candidates far from all three (`c`, m1, m2) indicate a third conceptual region — either an undiscovered knowledge area or a signal to unwind another Matryoshka layer. Scoring from m1 or m2 instead of `c` would anchor all results toward one pole, introducing confirmation bias. This three-zone exploration prevents **confirmation bias**: a system that only retrieves nearest neighbors to m1 returns documents that confirm the query's premise. By also exploring m2 and c, CORTEX surfaces contradictions, alternatives, and knowledge gaps. @@ -705,7 +722,12 @@ Smart sharing is a core capability, not a post-v1 extra. The v1 exchange path mu **Medoid** (mathematical term): The existing memory node selected as the statistical representative of a cluster. Selected by minimising the sum of distances to all other nodes in the cluster. Used throughout algorithmic descriptions and internal implementation comments. -**Centroid** (mathematical term): In MetroidBuilder, the centroid `c` is a full-dimensional vector where protected dimensions are copied from m1 (domain invariant) and unfrozen dimensions are the element-wise average of m1 and m2. Used as the balanced search origin in dialectical scoring. +**Centroid** (mathematical term): In MetroidBuilder, the centroid `c` is a full-dimensional vector +where protected dimensions are copied from m1 (domain invariant) and unfrozen dimensions are the +element-wise average of m1 and m2. `c` is a synthetic "Kansas space" position — a center of mass +where nothing in the memory graph typically exists. Its value is as a neutral vantage point: +scoring candidates by distance to `c` gives equal weight to both poles. A candidate closer to m1 +is thesis-supporting; closer to m2 is antithesis-supporting; near `c` is genuinely balanced. **Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid (protected dims from m1; unfrozen dims averaged). **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem. diff --git a/README.md b/README.md index ed31306..810472c 100644 --- a/README.md +++ b/README.md @@ -62,19 +62,20 @@ When new observations arrive, Hippocampus immediately: This is the rapid, multi-path "write" system that turns raw experience into structured memory scaffolding. ### 🧩 Cortex — Intelligent Routing & Coherence -When you ask a question, Cortex does **not** return a bag of similar vectors. - -**Planned target behavior (v0.5+):** -- Constructs a **Metroid** `{ m1, m2, c }` for the query — a structured dialectical search probe pairing the thesis medoid (m1) with an antithesis medoid (m2) and a balanced centroid (c) -- Performs Matryoshka dimensional unwinding to discover semantically opposing knowledge -- Performs parallel WebGPU "scoops" across the entire active universe (sub-millisecond) -- Pulls relevant sub-graphs from IndexedDB -- Traces closed-loop paths through Hebbian connections -- Returns only self-consistent, coherent context chains -- Detects **knowledge gaps** when no antithesis medoid exists within dimensional constraints -- Broadcasts P2P curiosity probes (with `mimeType` + `modelUrn` for commensurability) to discover missing knowledge from peers - -**Current behavior (v0.1):** +Cortex does **not** return a bag of similar vectors. + +**Required behavior (v0.5+ engineering target):** +- Must construct a **Metroid** `{ m1, m2, c }` for every query — a structured dialectical search probe pairing the thesis medoid (m1) with an antithesis medoid (m2) and a balanced centroid (c) + - The centroid `c` is a synthetic "Kansas space" vantage point (no real node lives there); scoring from `c` must give equal weight to both poles +- Must perform Matryoshka dimensional unwinding to discover semantically opposing knowledge +- Must perform parallel WebGPU "scoops" across the entire active universe (sub-millisecond) +- Must pull relevant sub-graphs from IndexedDB +- Must trace closed-loop paths through Hebbian connections +- Must return only self-consistent, coherent context chains +- Must detect **knowledge gaps** when no antithesis medoid exists within dimensional constraints +- Must broadcast P2P curiosity probes (with `mimeType` + `modelUrn` for commensurability) to discover missing knowledge from peers + +**Current behavior (v0.1 — placeholder):** - Flat top-K similarity scoring against the hotpath resident index with warm/cold spill - No MetroidBuilder, no dialectical pipeline, no knowledge gap detection yet From 62a719e1af7e2d409e647e9f8ee215e7fcf4fb72 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 20:15:53 +0000 Subject: [PATCH 6/6] =?UTF-8?q?docs:=20align=20MetroidBuilder=20to=20origi?= =?UTF-8?q?nal=20spec=20=E2=80=94=20cosine-opposite=20medoid,=20frozen=20c?= =?UTF-8?q?entroid,=20loop=20structure?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DESIGN.md: - The Metroid: added conceptual framing — antithesis medoid (m2) produces the frozen centroid (c) which becomes the stable platform for deeper exploration; added philosophical foundation (centroid=gravitational pull, medoid=data point anchor; neither alone sufficient); Metroid replaces prior sparse NN-graph constructions - m2 definition: explicit parallel structure with m1; m2 is always an existing memory node (medoid of cosine-opposite set), never a phantom computed position - MetroidBuilder Algorithm: complete rewrite as thesis→freeze→antithesis→synthesis loop - Step 1 (Thesis): medoid search for m1 (not centroid, always existing node) - Step 2 (Freeze): lock protected Matryoshka dimensions - Step 3 (Antithesis): score each candidate as -cosine_similarity in free dims; find medoid of top-scoring (cosine-opposite) set — m2 is the medoid, not a raw vector negation - Step 4 (Synthesis): compute c once and freeze it; never recomputed - Step 5 (Evaluate): all subsequent candidates measured against frozen c - Steps 6-7: unwind and stop as before, but with frozen c invariant - Matryoshka Dimensional Unwinding: new candidates evaluated against frozen c, not a recomputed centroid; stop on knowledge gap → broadcast curiosity - Terminology: Metroid and MetroidBuilder entries updated with frozen c and cosine-opposite medoid algorithm TODO.md P1-M: - Added game-inspired framing (opposition becomes stepping stone via frozen c) - Step-by-step algorithm: exact formula -cosine_similarity; medoid of top-scoring candidates; frozen c never recomputed - Exit criteria now explicitly mentions frozen centroid invariant - Updated test cases: test c is frozen; m2 is medoid not vector negation Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- DESIGN.md | 116 ++++++++++++++++++++++++++++++++++++++++-------------- TODO.md | 59 +++++++++++++++++++-------- 2 files changed, 128 insertions(+), 47 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index 9438c74..2f1dfe9 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -95,52 +95,93 @@ Three separate mathematical constructs are central to CORTEX. They must never be ### The Metroid -A Metroid is a structured search probe used for epistemically balanced exploration of a topic. +A Metroid is a structured search primitive for epistemically balanced exploration of a topic. + +The name captures a key architectural insight: what looks like an obstacle to progress — a medoid representing conceptual opposition — is not an enemy. The centroid computed from that opposition can be **held as a stable, frozen platform**, turning semantic divergence into a navigable step toward a goal. Every Metroid construction converts the antithesis (m2) into the anchor for the frozen centroid (c), which then provides structural support for deeper exploration. + +A Metroid replaces all prior sparse nearest-neighbor graph constructions as the canonical mechanism for guided semantic exploration in CORTEX. Opposition, divergence, and curiosity-driven augmentation are the designed search dynamics — not similarity-chasing. ``` Metroid = { m1, m2, c } ``` Where: -- **m1** — thesis medoid: the cluster representative most relevant to the query topic -- **m2** — antithesis medoid: a cluster representative discovered through constrained Matryoshka search to represent semantic opposition to m1 -- **c** — centroid: the synthetic center of mass between m1 and m2. +- **m1** — thesis medoid: found via medoid search from the query vector. A medoid (not a centroid) is always an existing memory node — it keeps the search on the correct conceptual road. +- **m2** — antithesis medoid: the medoid of the cosine-opposite set — not merely the nearest semantically-opposing node, but the **most coherent existing memory node in the direction of maximal divergence** from m1. Like m1, m2 is always an actual memory node, never a computed phantom position. +- **c** — centroid: the synthetic center of mass between m1 and m2, computed **once** and **frozen** as a stable platform. `c` is a "Kansas space" position — typically empty; no real node lives at the centroid. Its value is as a neutral vantage point: from `c`, distances to both poles and all candidates can be measured without anchoring bias toward either m1 or m2. +**Philosophical foundation:** Centroids (means) provide gravitational pull toward the midpoint. Medoids (medians) keep the search on the right road by anchoring to actual existing nodes. Neither alone guarantees epistemic honesty. The Metroid loop combines them: the medoid ensures the search never drifts to a phantom position; the frozen centroid ensures all subsequent evaluation is unbiased between the poles. + The Metroid is constructed at query time by the `MetroidBuilder`. It is **not** a persistent graph structure. It is a transient epistemological instrument. --- ### MetroidBuilder Algorithm -1. **Select m1** — Identify the topic medoid most relevant to the query embedding. -2. **Freeze protected dimensions** — Lock the lower Matryoshka embedding dimensions that encode invariant semantic context (domain, language register, topic class). These dimensions are never searched for antithesis. -3. **Search for m2** — Within the remaining (unfrozen) upper dimensions, search for the nearest medoid that represents semantic opposition to m1. -4. **Compute centroid** — Compute `c` as a center of mass between m1 and m2: - - Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These dimensions are invariant; averaging them would dilute the domain anchor that makes the antithesis search meaningful. - - Unfrozen dimensions (index >= `matryoshkaProtectedDim`): compute the element-wise average of m1 and m2 — `c[i] = (m1[i] + m2[i]) / 2`. - - The result is a full-dimensional vector that can be used directly as a scoring anchor. - - **Important:** `c` is a synthetic position — a "Kansas space". In most cases nothing actually - exists at the centroid; it is an empty field in embedding space, equidistant from both poles. - Its value is as a neutral vantage point. Standing at `c`, you can immediately measure whether - any candidate is closer to m1 (thesis), closer to m2 (antithesis), or equidistant from both - (genuinely synthetic). Scoring by proximity to `c` produces unbiased, balanced retrieval. - Scoring from m1 or m2 would pull all results toward one pole. -5. **Use centroid as scoring vantage point** — Weight candidates by their distance to `c`, not to m1 or m2. +One full Metroid step is a **thesis → freeze → antithesis → synthesis** cycle: + +1. **Thesis — Select m1** — From the query vector `q`, perform a medoid search to find `m1`: the + median representative of the most relevant cluster. A medoid is always an existing memory node, + ensuring the search stays on the correct conceptual road. Centroids (means) provide + gravitational pull; medoids (medians) provide the road. + +2. **Freeze** — Lock the first `n` protected Matryoshka dimensions in place. These dimensions + encode invariant semantic context (domain, language register, topic class). Locking them + preserves early decisions as fixed structure — preventing the search from drifting into + vocabulary that shares surface-level patterns but belongs to a different conceptual domain. + +3. **Antithesis — Find m2** — On the remaining free (unfrozen) dimensions: + - Compute the **cosine-opposite score** for every candidate medoid: score each candidate as + `-cosine_similarity(candidate_free_dims, m1_free_dims)`. The highest-scoring candidates are + farthest from m1 in the free dimensions — representing maximal conceptual divergence. + - Find the **medoid of that cosine-opposite set** (the top-scoring candidates). This is `m2`. + - `m2` is the medoid of the top-scoring candidates — not the result of a direct vector + negation. The medoid operation selects the most coherent existing memory node in the + direction of maximal divergence. The medoid operation ensures `m2` is always + an actual memory node. + +4. **Synthesis — Freeze the centroid** — Compute `c` as the center of mass between m1 and m2 + and immediately **freeze it**. `c` is computed once per Metroid construction and never + recalculated: + - Protected dimensions (index < `matryoshkaProtectedDim`): copy directly from m1. These + dimensions are invariant; averaging them would dilute the domain anchor. + - Free dimensions (index >= `matryoshkaProtectedDim`): element-wise average of m1 and m2 — + `c[i] = (m1[i] + m2[i]) / 2`. + - `c` is a "Kansas space" position — typically empty; no real node lives at the centroid. + Its value is as a neutral vantage point: from `c`, distances to both poles and all + candidates can be measured without anchoring bias toward either m1 or m2. + +5. **Evaluate subsequent candidates against the frozen centroid** — All further medoids + (`m3`, `m4`, ...) found during Matryoshka unwinding are evaluated relative to this frozen `c`: - Near `c`: synthesis territory — balanced between both poles. - Much closer to m1 than to `c`: thesis-supporting. - Much closer to m2 than to `c`: antithesis-supporting. - - Far from `c`, m1, and m2 simultaneously: a third conceptual region not captured by either pole — signal for further Matryoshka unwinding or a knowledge gap. - Scoring from `c` avoids anchoring bias; see the Dialectical Search section for the full zone model. -6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from step 3. Each unwinding broadens the antithesis search. -7. **Stop at the protected dimension** — The protected lower dimensions are never unwound. This preserves semantic invariants throughout all levels of search. + - Far from `c`, m1, and m2 simultaneously: third conceptual region — signal for further + unwinding or a knowledge gap. + The centroid is a platform. Opposition has been frozen into a stepping stone. + +6. **Unwind Matryoshka layers** — Progressively free deeper embedding dimensions and repeat from + step 3. Each unwinding broadens the antithesis search space. Subsequent antithesis candidates + are still evaluated relative to the original frozen `c` — it is never recomputed. + +7. **Stop at the protected dimension** — The protected lower dimensions are never unwound. Once + the Matryoshka unwind has reached the protected floor, no further antithesis search is possible. + If no satisfactory `m2` was found at any layer, set `knowledge_gap = true` and broadcast a + curiosity query (see Knowledge Gap Detection). **Why protect dimensions?** -Without dimensional protection, high-dimensional similarity in unrelated vocabulary can dominate the search. Specifically, upper Matryoshka dimensions encode fine-grained distinctions that may closely match surface-level word patterns regardless of topic. Protected lower dimensions encode domain context (e.g., "food/cooking") that anchors the search. Without this anchor, a query about pizza toppings could accumulate similarity mass toward adhesive-related terms in the high dimensions — because words describing how things stick together are statistically present in both culinary and industrial glue contexts. The protected dimensions ensure the culinary domain context is never overridden by this incidental high-dimensional similarity. +Without dimensional protection, high-dimensional similarity in unrelated vocabulary can dominate +the search. Upper Matryoshka dimensions encode fine-grained distinctions that may closely match +surface-level word patterns regardless of topic. Protected lower dimensions encode domain context +(e.g., "food/cooking") that anchors the search. Without this anchor, a query about pizza toppings +could accumulate similarity mass toward adhesive-related terms — because words describing how +things stick together are statistically present in both culinary and industrial glue contexts. +The protected dimensions ensure the culinary domain context is never overridden by this incidental +high-dimensional similarity. --- @@ -154,10 +195,15 @@ CORTEX uses Matryoshka Representation Learning (MRL) models that pack semantic i At each unwinding step: 1. The protected dimension boundary shifts one layer outward. 2. The antithesis search space expands into the newly freed dimensions. -3. A new `m2` candidate is evaluated against the expanded space. -4. The Metroid `{ m1, m2, c }` is recomputed with the updated `m2`. +3. A new `m2` candidate is found via cosine-opposite medoid search in the expanded space. +4. The new candidate is evaluated relative to the **frozen** `c` (computed in the first synthesis + step and never recalculated). If it is close enough to `c`, the step is accepted; otherwise + the search continues unwinding or declares a knowledge gap. -This produces progressively wider dialectical exploration while maintaining semantic coherence. The search terminates either when the protected dimension is reached or when a satisfactory `m2` is found. +This produces progressively wider dialectical exploration while maintaining semantic coherence. +The frozen centroid ensures that each expansion step is measured against a stable platform rather +than a shifting target. The search terminates either when the protected dimension floor is reached +or when a satisfactory `m2` is found. --- @@ -729,9 +775,19 @@ where nothing in the memory graph typically exists. Its value is as a neutral va scoring candidates by distance to `c` gives equal weight to both poles. A candidate closer to m1 is thesis-supporting; closer to m2 is antithesis-supporting; near `c` is genuinely balanced. -**Metroid** (CORTEX architectural term): A structured dialectical search probe constructed at query time: `{ m1, m2, c }`, where m1 is the thesis medoid, m2 is the antithesis medoid, and c is the centroid (protected dims from m1; unfrozen dims averaged). **A Metroid is never stored as a persistent graph structure.** It is an ephemeral instrument used by the CORTEX retrieval subsystem. - -**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via Matryoshka dimensional unwinding. Planned module: `cortex/MetroidBuilder.ts`. +**Metroid** (CORTEX architectural term): A structured dialectical search primitive constructed at +query time: `{ m1, m2, c }`. m1 is the thesis medoid (found via medoid search from query vector q); +m2 is the antithesis medoid (the medoid of the cosine-opposite set in the free dimensions — not +merely a semantically-opposing node, but the most coherent representative of maximal divergence); +c is the centroid (protected dims from m1; free dims averaged), computed **once and frozen** as a +stable evaluation platform. All subsequent candidates in the Matryoshka unwind are evaluated +relative to this frozen c. **A Metroid is never stored as a persistent graph structure.** It is an +ephemeral instrument used by the CORTEX retrieval subsystem. + +**MetroidBuilder**: The CORTEX module responsible for constructing a Metroid for a given query via +Matryoshka dimensional unwinding. Runs the thesis→freeze→antithesis→synthesis loop: m1 via medoid +search; m2 via cosine-opposite medoid; c computed once and frozen; subsequent candidates evaluated +relative to frozen c. Planned module: `cortex/MetroidBuilder.ts`. **Semantic neighbor graph** (also: proximity graph, neighbor graph): The sparse radius-graph of cosine-similarity edges between pages, used for subgraph expansion during retrieval. This is **not** the same as a Metroid. The edges connect pages with high cosine similarity and are used for BFS expansion. Currently named `MetroidNeighbor` / `metroid_neighbors` in the codebase — this is a naming error that must be corrected (tracked in TODO as P0-X). diff --git a/TODO.md b/TODO.md index b8289f4..bc21997 100644 --- a/TODO.md +++ b/TODO.md @@ -354,35 +354,60 @@ These items add hierarchical routing and coherent path ordering. They transform ### P1-M: MetroidBuilder (DELIVERS: dialectical epistemology) -**Why:** MetroidBuilder is the core of what makes CORTEX an _epistemic_ system rather than a vector search engine. Without it, the system merely returns nearest neighbors and cannot explore opposing perspectives, detect knowledge gaps, or trigger P2P curiosity requests. +**Why:** MetroidBuilder is the core of what makes CORTEX an _epistemic_ system rather than a vector search engine. Without it, the system merely returns nearest neighbors and cannot explore opposing perspectives, detect knowledge gaps, or trigger P2P curiosity requests. The Metroid loop converts conceptual opposition into navigable exploration steps. - [ ] **P1-M1:** Implement `cortex/MetroidBuilder.ts` - - Accept a query embedding and a list of resident medoids (shelf/volume/book representatives) - - Select m1: the medoid with highest cosine similarity to the query - - Read `matryoshkaProtectedDim` from `ModelProfile` (the field added to `core/ModelProfile.ts` as the per-model protected floor — e.g. 128 for embeddinggemma-300m, 64 for nomic-embed-text-v1.5). If `undefined` on the current model, return `{ m1, m2: null, c: null, knowledgeGap: true }` immediately. - - Freeze all dimensions with index < `matryoshkaProtectedDim` - - In the unfrozen upper dimensions (index >= `matryoshkaProtectedDim`), search for the nearest medoid with **opposing** semantic direction (minimum cosine similarity above a negative threshold, or maximum angular distance) - - This medoid becomes m2 (antithesis) - - Compute centroid: protected dims (< matryoshkaProtectedDim) copied from m1 vector; unfrozen dims averaged element-wise: `c[i] = (m1[i] + m2[i]) / 2` - - Return `Metroid { m1, m2, c }`; if no valid m2 found, return `{ m1, m2: null, c: null, knowledgeGap: true }` + - Accept a query embedding `q` and a list of resident medoids (shelf/volume/book representatives) + - **Thesis (select m1):** Find `m1` via medoid search — the medoid minimizing distance to `q`. A + medoid (not a centroid) is always an existing memory node; it ensures the search anchor is an + actual data point rather than an averaged phantom position. This keeps the search on the + correct conceptual road. + - Read `matryoshkaProtectedDim` from `ModelProfile` (e.g. 128 for embeddinggemma-300m, 64 for + nomic-embed-text-v1.5). If `undefined` on the current model (non-Matryoshka), return + `{ m1, m2: null, c: null, knowledgeGap: true }` immediately. + - **Freeze:** Lock all dimensions with index < `matryoshkaProtectedDim`. + - **Antithesis (find m2):** In the unfrozen upper dimensions (index >= `matryoshkaProtectedDim`): + 1. Score every candidate medoid as `-cosine_similarity(candidate_free_dims, m1_free_dims)`. + The highest-scoring candidates are farthest from m1 in the free dimensions — maximal + conceptual divergence. + 2. Find the **medoid of that cosine-opposite set** (the top-scoring candidates). This is `m2`. + 3. `m2` must be an existing memory node (not a computed position). The medoid operation + ensures this. This is distinct from simply finding the node with the lowest cosine + similarity to m1. + - **Synthesis (freeze centroid):** Compute `c` once and freeze it: + - Protected dims (< `matryoshkaProtectedDim`): copy from m1 (domain invariant). + - Free dims (>= `matryoshkaProtectedDim`): `c[i] = (m1[i] + m2[i]) / 2`. + - This frozen `c` is never recalculated. All future candidates in the Matryoshka unwind are + evaluated relative to this frozen platform. + - Return `Metroid { m1, m2, c }`; if no valid m2 found, return + `{ m1, m2: null, c: null, knowledgeGap: true }` - [ ] **P1-M2:** Implement Matryoshka dimensional unwinding in `cortex/MetroidBuilder.ts` - - After initial Metroid construction, progressively expand the antithesis search into deeper embedding layers - - At each step, lower the protected dimension boundary by one Matryoshka tier - - Re-evaluate `m2` at each tier; prefer the deepest tier's Metroid as the final result - - Stop when the protected dimension floor is reached + - After the initial Metroid construction, progressively expand the antithesis search into deeper + embedding layers by shifting the protected dimension boundary outward one Matryoshka tier at a + time. + - At each new tier, find a new `m2` candidate via cosine-opposite medoid search in the expanded + free dimensions. + - Evaluate each candidate against the **frozen** `c` (not a recomputed centroid). If close + enough to `c`, accept and freeze this step; take the next conceptual leap. If not, + continue unwinding. + - Stop when the protected dimension floor is reached or a satisfactory `m2` is accepted. + - If no satisfactory `m2` is found at any layer, return `knowledgeGap: true`. - [ ] **P1-M3:** Add MetroidBuilder test coverage - `tests/cortex/MetroidBuilder.test.ts` - - Test m1 selection: highest similarity medoid is chosen - - Test m2 selection: most semantically opposite medoid is chosen - - Test centroid computation: midpoint between m1 and m2 vectors + - Test m1 selection: the medoid minimising distance to q is chosen (not the centroid) + - Test m2 selection: medoid of cosine-opposite set — not merely nearest semantically-opposing node + - Test centroid computation: protected dims copied from m1; free dims averaged element-wise + - Test centroid is frozen: subsequent unwinding steps do not recompute c - Test dimensional unwinding: search expands progressively through Matryoshka layers - Test knowledge gap: when no valid m2 exists in any layer, returns `knowledgeGap: true` - Test protected dimensions are never searched for antithesis - Test determinism: same inputs always produce same Metroid -**Exit Criteria:** MetroidBuilder constructs valid Metroids and correctly detects knowledge gaps. +**Exit Criteria:** MetroidBuilder constructs valid Metroids (m1 via medoid search, m2 via +cosine-opposite medoid of the top-scoring candidates, c computed once and never recomputed during +Matryoshka unwinding) and correctly detects knowledge gaps. ---