From 5361f514551ee1556a8cd2fb761f22913c30161d Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 02:20:04 +0000 Subject: [PATCH 1/8] Initial plan From 79eecdc0d55b599d1f6fccda8e349db9fe5efbf6 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 02:34:27 +0000 Subject: [PATCH 2/8] docs: integrate ERRATA.md Williams Bound into README, DESIGN, PLAN, TODO; delete ERRATA.md Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- DESIGN.md | 252 +++++++++++++++++++++++++++++++++---- ERRATA.md | 363 ------------------------------------------------------ PLAN.md | 144 +++++++++++++--------- README.md | 8 +- TODO.md | 303 +++++++++++++++++++++++++++++++++------------ 5 files changed, 547 insertions(+), 523 deletions(-) delete mode 100644 ERRATA.md diff --git a/DESIGN.md b/DESIGN.md index bf22a19..32387b8 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -1,7 +1,7 @@ # CORTEX Design Specification -**Version:** 1.0 -**Last Updated:** 2026-03-12 +**Version:** 1.1 +**Last Updated:** 2026-03-13 ## Executive Summary @@ -48,7 +48,125 @@ Idle background consolidation that prevents catastrophic forgetting. **Performance Target:** Opportunistic, interruptible, no foreground blocking -## Data Model +## The Williams Bound & Sublinear Growth + +### Motivation + +CORTEX applies the Williams 2025 result — S = O(√(t log t)) — as a universal sublinear growth law everywhere the system trades space against time: the resident hotpath index, per-tier hierarchy quotas, per-community graph budgets, Metroid degree limits, and Daydreamer maintenance batch sizing. This single principle ensures the system stays efficient as the memory graph scales from hundreds to millions of nodes. + +### Graph Mass Definition + +``` +t = |V| + |E| = total pages + (Hebbian edges + Metroid edges) +``` + +This is the canonical measure of graph complexity used in all capacity formulas. + +### Resident Hotpath Capacity + +``` +H(t) = ⌈c · √(t · log₂(1 + t))⌉ +``` + +`c` is an empirically tuned constant (default in `core/HotpathPolicy.ts`; not a theorem output). H(t) defines the maximum number of entities resident in the in-memory hotpath index across all tiers. + +**Growth properties (required by tests):** +- H(t) is monotonically non-decreasing as t grows +- H(t) grows sublinearly relative to t (confirmed by benchmark at 1K, 10K, 100K, 1M) + +### Three-Zone Memory Model + +| Zone | Resident? | Storage | Typical Lookup Cost | +|------|-----------|---------|---------------------| +| **HOT** | Yes — in resident index, capacity H(t) | RAM | Sub-millisecond | +| **WARM** | No — indexed, not resident | IndexedDB | Single-digit milliseconds | +| **COLD** | No — raw bytes only, no index entry | OPFS | Tens of milliseconds | + +All data is retained locally across all three zones. Zones control lookup **cost**, not data **lifetime**. The runtime continuously promotes and evicts entries between HOT and WARM based on salience. + +### Node Salience + +Each page `v` carries a node-level salience score that drives promotion into and eviction from the hotpath: + +``` +σ(v) = α · H_in(v) + β · R(v) + γ · Q(v) +``` + +| Component | Meaning | +|-----------|---------| +| `H_in(v)` | Sum of incident Hebbian edge weights | +| `R(v)` | Recency score — exponential decay from `createdAt` / `lastQueryAt` | +| `Q(v)` | Query-hit count for the node | +| α, β, γ | Tunable weights summing to 1.0 (defaults: 0.5 / 0.3 / 0.2) | + +Salience requires lightweight per-page activity metadata (`queryHitCount`, `lastQueryAt`) stored in the `page_activity` IndexedDB object store. + +### Hierarchical Tier Quotas + +H(t) is partitioned across the four-tier hierarchy so no single tier can monopolise the resident index: + +| Tier | Default Quota | Purpose | +|------|--------------|---------| +| Shelf | q_s = 10% | Routing prototypes | +| Volume | q_v = 20% | Cluster prototypes | +| Book | q_b = 20% | Book medoids | +| Page | q_p = 50% | Individual page representatives | + +**Constraint:** q_s + q_v + q_b + q_p = 1.0 + +Within each tier, entries are ranked by salience; the highest-salience representatives are admitted up to the tier budget. Shelf, Volume, and Book representatives are selected by the medoid statistic within their cluster, then ranked by salience for admission. + +### Graph-Community Coverage Quotas + +Within each tier's budget, slots are allocated proportionally across detected graph communities to prevent a single dense topic from consuming all capacity: + +``` +community_quota(Cᵢ) = max(1, ⌈tier_budget · nᵢ / N⌉) +``` + +where `nᵢ` is the number of pages in community Cᵢ and N is the total page count. Community detection runs via lightweight label propagation on the Metroid neighbor graph during Daydreamer idle passes. + +This **dual constraint** — tier quota × community quota — ensures both vertical coverage across hierarchy levels and horizontal coverage across topics. + +### Promotion and Eviction Lifecycle + +**Bootstrap phase** (while resident count < H(t)): admit the highest-salience candidate not yet resident. + +**Steady-state phase**: promote a new or updated node only if its salience exceeds the weakest resident in the same tier and community bucket. On promotion, evict the weakest; break ties by recency. + +**Trigger points:** +- On ingest — newly ingested pages become candidates +- On query hit — `queryHitCount` increases; salience is recomputed; promotion sweep runs +- On Daydreamer pass — after LTP/LTD, recompute salience for affected nodes; run promotion sweep + +### Sublinear Fanout Bounds + +Maximum children per hierarchy node also respect Williams-derived limits to prevent unbounded fan-out: + +``` +Max volumes per shelf = O(√(|volumes| · log |volumes|)) +Max books per volume = O(√(|books_in_volume| · log |books_in_volume|)) +``` + +When exceeded, `HierarchyBuilder` or `ClusterStability` triggers a split. + +### Dynamic Subgraph Expansion Bounds + +The fixed `<30 node` subgraph target is replaced by dynamic formulas that shrink gracefully as the graph grows: + +``` +maxSubgraphSize = min(30, ⌊√(t · log₂(1+t)) / log₂(t)⌋) +maxHops = ⌈log₂(log₂(1 + t))⌉ +perHopBranching = ⌊maxSubgraphSize ^ (1 / maxHops)⌋ +``` + +This keeps subgraph expansion cost sublinear in graph mass. + +### Policy Source of Truth + +All hotpath constants — `c`, `α`, `β`, `γ`, `q_s`, `q_v`, `q_b`, `q_p` — live in `core/HotpathPolicy.ts` as a frozen default policy object. These are **policy-derived constants** (not model-derived) and are kept strictly separate from `core/ModelDefaults.ts`. A companion guard (or an extension to `guard:model-derived`) prevents these constants from being hardcoded elsewhere. + +--- ### Entity Hierarchy @@ -142,6 +260,32 @@ interface MetroidNeighbor { } ``` +### Hotpath Entities + +#### PageActivity +Lightweight per-page activity metadata maintained alongside each Page. Drives salience computation and community assignment. + +```typescript +interface PageActivity { + pageId: Hash; + queryHitCount: number; // incremented on each query hit + lastQueryAt: string; // ISO timestamp of most recent query hit + communityId?: string; // set by Daydreamer label propagation +} +``` + +#### HotpathEntry +A record in the resident in-memory index. Tracks which entity is HOT and at what salience level. + +```typescript +interface HotpathEntry { + entityId: Hash; // pageId, bookId, volumeId, or shelfId + tier: 'shelf' | 'volume' | 'book' | 'page'; + salience: number; // σ value at last computation + communityId?: string; // community this entry counts against +} +``` + ## Storage Architecture ### Vector Storage (OPFS) @@ -164,28 +308,39 @@ Structured entity storage with automatic reverse indexes. - `metroid_neighbors` (sparse NN graph) - `flags` (dirty-volume recalc markers) - `page_to_book`, `book_to_volume`, `volume_to_shelf` (reverse indexes) +- `hotpath_index` (resident hotpath entries, keyed by `entityId`) +- `page_activity` (per-page activity metadata for salience computation) ## Retrieval Design ### Cortex Query Path 1. **Embed Query** — Generate query embedding -2. **Rank Shelves** — Score using coarse prototypes -3. **Rank Volumes** — Within top shelves -4. **Rank Books** — Within top volumes -5. **Rank Pages** — Select seed pages -6. **Expand Subgraph** — BFS through Metroid neighbors (bounded hops) -7. **Solve Coherent Path** — Open TSP with dummy-node heuristic -8. **Return Result** — Ordered memory chain + provenance metadata - -**Key Constraints:** -- Keep query-time subgraphs small (target <30 nodes) -- Prefer sparse graph expansion over global traversal -- Deterministic under same input for reproducibility +2. **Score Resident Shelves** — Score query against HOT shelf prototypes in H(t) resident index +3. **Score Resident Volumes** — Score against HOT volume prototypes within top-ranked shelves +4. **Score Resident Books** — Score against HOT book medoids within top-ranked volumes +5. **Score Resident Pages** — Score against HOT page representatives within top-ranked books +6. **Spill to Warm/Cold** — If resident coverage is insufficient, expand lookup to WARM (IndexedDB) and COLD (OPFS) tiers +7. **Expand Subgraph** — BFS through Metroid neighbors using dynamic bounds (see below) +8. **Solve Coherent Path** — Open TSP with dummy-node heuristic +9. **Return Result** — Ordered memory chain + provenance metadata + +Steps 2–5 operate exclusively on the resident set of size H(t), making H(t) the primary latency-control mechanism. Spill to WARM/COLD (step 6) occurs only when the resident set does not contain sufficient coverage. + +**Query Cost Meter:** The query path counts vector operations. If the cumulative cost exceeds a Williams-derived budget, the query early-stops and returns the best result found so far. ### Coherence via Open TSP Rather than returning nearest neighbors by similarity, Cortex traces a coherent path through the induced subgraph using a dummy-node open TSP strategy. This produces a natural "narrative flow" through related memories. +### Key Constraints +- Steps 2–5 operate on the resident hotpath (H(t) entries), not the full corpus +- Subgraph expansion uses dynamic Williams-derived bounds, not a fixed node cap: + - `maxSubgraphSize = min(30, ⌊√(t · log₂(1+t)) / log₂(t)⌋)` + - `maxHops = ⌈log₂(log₂(1 + t))⌉` + - `perHopBranching = ⌊maxSubgraphSize ^ (1/maxHops)⌋` +- Deterministic under same input for reproducibility +- Query cost is metered; early-stop prevents unbounded latency + ## Ingestion Design ### Hippocampus Ingest Path @@ -193,13 +348,13 @@ Rather than returning nearest neighbors by similarity, Cortex traces a coherent 1. **Chunk Text** — Split into pages respecting token budgets from ModelProfile 2. **Generate Embeddings** — Batch embed with selected provider 3. **Persist Vectors** — Append to OPFS vector file -4. **Persist Pages** — Write page metadata to IndexedDB -5. **Build/Attach Hierarchy** — Construct/update books, volumes, shelves -6. **Fast Neighbor Insert** — Update Metroid neighbors incrementally +4. **Persist Pages** — Write page metadata to IndexedDB; initialise `PageActivity` record +5. **Build/Attach Hierarchy** — Construct/update books, volumes, shelves; attempt hotpath admission for each level's medoid/prototype using tier quota via `SalienceEngine` +6. **Fast Neighbor Insert** — Update Metroid neighbors incrementally; bounded degree via `HotpathPolicy`; check new page for hotpath admission 7. **Mark Dirty** — Flag volumes for full recalc by Daydreamer **Incremental Strategy:** -Fast local Metroid neighbor insertion keeps query-time latency low. Full neighborhood recalculation is deferred to idle Daydreamer passes. +Fast local Metroid neighbor insertion keeps query-time latency low. Full neighborhood recalculation is deferred to idle Daydreamer passes. Hotpath admission runs at ingest time for new pages and hierarchy prototypes. ## Consolidation Design @@ -208,24 +363,35 @@ Fast local Metroid neighbor insertion keeps query-time latency low. Full neighbo **LTP/LTD (Hebbian Updates):** - Strengthen edges traversed during successful queries - Decay unused edges toward zero -- Prune edges below threshold +- Prune edges below threshold, keeping Metroid degree within Williams-derived bounds +- After LTP/LTD: recompute σ(v) for all nodes whose incident edges changed; run promotion/eviction sweep via `SalienceEngine` **Prototype Recomputation:** - Recompute volume/shelf medoids and centroids - Update prototype vectors in vector file +- After recomputation: recompute salience for affected representative entries; run tier-quota promotion/eviction for volume and shelf tiers **Full Metroid Recalc:** - For dirty volumes, recompute all pairwise similarities -- Rebuild bounded neighbor lists -- Clear dirty flags +- Bound batch size: process at most O(√(t log t)) pairwise comparisons per idle cycle +- Prioritise dirtiest volumes first +- Rebuild bounded neighbor lists; degree limit derived from `HotpathPolicy` +- Clear dirty flags; recompute salience for affected nodes; run promotion sweep + +**Community Detection:** +- Run lightweight label propagation on the Metroid neighbor graph during idle passes +- Store community labels in `PageActivity.communityId` +- Rerun when dirty-volume flags indicate meaningful structural change +- Empty communities release their slots; new communities receive at least one slot **Experience Replay:** - Simulate queries over recent memories - Reinforce important connection patterns **Cluster Stability:** -- Detect unstable clusters (high variance, imbalanced size) +- Detect unstable clusters (high variance, imbalanced size, Williams fanout violation) - Trigger split/merge when thresholds exceeded +- Run community detection after structural changes ## Security & Trust @@ -270,12 +436,13 @@ Keep cryptographic service separate from routing/storage concerns. All hashing/s | Operation | Target | Hardware Assumption | |-----------|--------|---------------------| | Ingest single page | <50ms | WebGPU-class | -| Query seed ranking | <20ms | Moderate corpus | -| Coherence path solve | <10ms | <30 node subgraph | +| Query seed ranking (resident) | <20ms | H(t) resident index | +| Coherence path solve | <10ms | Dynamic subgraph (≤30 nodes) | | Daydreamer work | Interruptible | No blocking | +| Hotpath promotion/eviction | <5ms | Per trigger point | **Graceful Degradation:** -All operations must complete on WASM fallback, albeit slower. +All operations must complete on WASM fallback, albeit slower. The resident hotpath index reduces query latency proportionally to H(t) coverage of the working set. ## Non-Negotiable Constraints @@ -283,6 +450,7 @@ All operations must complete on WASM fallback, albeit slower. 2. **Fast Local Retrieval** — Must work on degraded hardware 3. **Persistent Local State** — Survive browser restart with integrity checks 4. **Idle Consolidation** — Background quality improvements, not expensive write-time computation +5. **Sublinear Growth** — The resident hotpath index must never exceed H(t); all space-time tradeoff subsystems must target O(√(t log t)) scaling ## System Boundaries @@ -303,6 +471,18 @@ All operations must complete on WASM fallback, albeit slower. **medoid** (mathematical term): The underlying clustering statistic. Reserved for algorithmic comments and internal statistical descriptions only. +**Hotpath**: The in-memory resident index of H(t) entries spanning all four hierarchy tiers. The hotpath is the first lookup target for every query; misses spill to WARM/COLD storage. + +**Williams Bound**: The theoretical result S = O(√(t log t)) from Williams 2025, applied here as a universal sublinear growth law for all space-time tradeoff subsystems in CORTEX. + +**Graph mass (t)**: t = |V| + |E| = total pages plus all edges (Hebbian + Metroid). The canonical input to all capacity and bound formulas. + +**Salience (σ)**: Node-level score combining Hebbian edge weight, recency, and query-hit frequency. Drives admission to and eviction from the hotpath. + +**Three-zone model**: HOT (resident), WARM (IndexedDB-indexed), COLD (OPFS bytes only). All zones retain data locally; zones differ only in lookup cost. + +**Community**: A topically coherent subgraph identified by label propagation on the Metroid neighbor graph. Community quotas prevent any single topic from monopolising the hotpath. + ## Model-Derived Numerics **Critical Rule:** All numeric values derived from ML model architecture (embedding dimensions, context lengths, thresholds) must **never** be hardcoded as magic numbers. @@ -315,6 +495,25 @@ All operations must complete on WASM fallback, albeit slower. **Enforcement:** `npm run guard:model-derived` scans for violations before CI merge. +## Policy-Derived Constants + +A parallel class of constants governs the Williams Bound hotpath architecture. These are **not** model-derived (they do not depend on ML architecture); they are empirically tuned policy values. + +**Source of Truth:** `core/HotpathPolicy.ts` — frozen default policy object + +| Constant | Default | Meaning | +|----------|---------|---------| +| `c` | 0.5 | Scaling factor in H(t) formula | +| `α` | 0.5 | Salience weight for Hebbian connectivity | +| `β` | 0.3 | Salience weight for recency | +| `γ` | 0.2 | Salience weight for query-hit frequency | +| `q_s` | 0.10 | Shelf tier quota fraction | +| `q_v` | 0.20 | Volume tier quota fraction | +| `q_b` | 0.20 | Book tier quota fraction | +| `q_p` | 0.50 | Page tier quota fraction | + +**Enforcement:** Policy constants must not be hardcoded outside `core/HotpathPolicy.ts`. A companion guard or ESLint rule prevents silent duplication. + ## Future Directions (Post-v1) - **P2P Memory Exchange** — Signed subgraph payloads over WebRTC @@ -323,3 +522,4 @@ All operations must complete on WASM fallback, albeit slower. - **Adaptive Chunking** — Context-aware page boundary detection - **Multi-Modal Support** — Image/audio embeddings alongside text - **CRDT-based Merge** — Conflict-free replicated data structures for multi-device sync +- **Empirical Calibration of c** — Instrument real workloads to tune the Williams Bound scaling constant across diverse corpus profiles diff --git a/ERRATA.md b/ERRATA.md deleted file mode 100644 index 0674b8e..0000000 --- a/ERRATA.md +++ /dev/null @@ -1,363 +0,0 @@ -# ERRATA - -## Williams Bound - Comprehensive Hotpath Architecture - -### TL;DR - -Apply the Williams 2025 result S = O(sqrt(t log t)) as a universal sublinear growth law everywhere the system trades space against time: the resident hotpath index, per-tier hierarchy quotas, per-community graph budgets, and Daydreamer maintenance batch sizing. Define t = |V| + |E| (total graph mass). Derive the resident representative capacity H(t) = ceil(c * sqrt(t * log2(1 + t))). Hebbian-derived node salience drives promotion and eviction, but representative selection also enforces hierarchical tier quotas and graph-community coverage quotas so the hotpath is both hot and diverse. - ---- - -### Phase A - Theoretical Foundation - -#### A1. Formalize the theorem mapping - -- Define t = |V| + |E| (pages + Hebbian edges + Metroid edges). -- Define H(t) = ceil(c * sqrt(t * log2(1 + t))), the resident hotpath capacity. -- State the design principle: every subsystem that can trade space for time must target sublinear growth at this rate. -- List what counts toward resident capacity: promoted pages, tier prototypes, and active Metroid neighbor entries. -- Define the three-zone model: - - HOT: resident index, capacity H(t) - - WARM: indexed in IndexedDB but not memory-resident - - COLD: vector bytes in OPFS, metadata in IndexedDB, no index entry -- Note that all data stays local; zones affect lookup cost, not retention. -- Reference Williams 2025 as the source and state that c is an empirically tuned constant, not a theorem output. - -#### A2. Define node salience - -The current schema has edge-level Hebbian weights but no node-level score. Define node salience sigma(v) for a page v: - -sigma(v) = alpha * H_in(v) + beta * R(v) + gamma * Q(v) - -Where: - -- H_in(v) = sum of incident Hebbian edge weights -- R(v) = recency score using exponential decay from createdAt or lastUpdatedAt -- Q(v) = query-hit count for the node -- alpha, beta, gamma are tunable weights summing to 1.0 - -This requires lightweight per-page activity metadata such as queryHitCount and lastQueryAt. - -#### A3. Define hierarchical tier quotas - -Partition H(t) across the 4-level hierarchy so no single tier monopolizes the hotpath: - -- Shelf quota: q_s * H(t), example q_s = 0.10 for routing prototypes -- Volume quota: q_v * H(t), example q_v = 0.20 for cluster prototypes -- Book quota: q_b * H(t), example q_b = 0.20 for book medoids -- Page quota: q_p * H(t), example q_p = 0.50 for individual page representatives - -Subject to: - -q_s + q_v + q_b + q_p = 1.0 - -Each quota tier holds the highest-salience representatives of that tier's entities. Shelf, Volume, and Book representatives are selected by medoid statistic within their cluster and then ranked by salience for admission. - -#### A4. Define graph-community coverage quotas - -Within each tier's budget, allocate slots proportionally across detected communities so one dense topic cannot consume all capacity. Community detection uses the existing Metroid neighbor graph through connected components or lightweight label propagation during Daydreamer idle passes. - -For community C_i with n_i pages out of N total: - -community_quota(C_i) = max(1, ceil(tier_budget * n_i / N)) - -This dual constraint, tier plus community, ensures both vertical coverage across hierarchy levels and horizontal coverage across topics. - ---- - -### Phase B - Core Policy Module - -#### B1. Create core/HotpathPolicy.ts - -This becomes the central source of truth. It should export: - -- computeCapacity(graphMass: number): number -- computeSalience(hebbianIn: number, recency: number, queryHits: number, weights?): number -- deriveTierQuotas(capacity: number, quotaRatios?): TierQuotas -- deriveCommunityQuotas(tierBudget: number, communitySizes: number[]): number[] - -All numeric constants such as c, alpha, beta, gamma, q_s, q_v, q_b, and q_p should live here as a frozen default policy object, analogous to the existing routing-policy and model-derivation defaults. - -#### B2. Add tests for HotpathPolicy - -Write tests first for: - -- H(t) grows sublinearly -- H(t) is monotonically non-decreasing -- Tier quotas sum to capacity -- Community quotas sum to tier budget and each remain at least 1 -- Salience is deterministic for the same inputs - -#### B3. Extend core/types.ts - -Add: - -- PageActivity interface with queryHitCount and lastQueryAt -- HotpathEntry interface with entityId, tier, salience, and optional communityId -- MetadataStore hotpath methods such as putHotpathEntry, getHotpathEntries, evictWeakest, and getResidentCount - -#### B4. Extend storage/IndexedDbMetadataStore.ts - -Add: - -- hotpath_index object store keyed by entityId -- page_activity object store or equivalent page metadata extension -- persistence methods for the new hotpath interfaces -- storage tests covering hotpath persistence and resident counts - ---- - -### Phase C - Salience Engine and Promotion Lifecycle - -#### C1. Create core/SalienceEngine.ts - -Add helpers such as: - -- computeNodeSalience(pageId, metadataStore) -- batchComputeSalience(pageIds, metadataStore) -- shouldPromote(candidateSalience, weakestResidentSalience, capacityRemaining) -- selectEvictionTarget(tier, communityId, metadataStore) - -#### C2. Promotion and eviction lifecycle - -Bootstrap phase: - -- While hotpath size is below H(t), admit the highest-salience node not yet resident. - -Steady-state phase: - -- When a new or updated node has salience greater than the weakest resident in its tier and community bucket, evict the weakest and promote the candidate. -- Break ties by recency. - -Trigger points: - -- On ingest: newly ingested pages become candidates -- On query: queryHitCount increases and salience is recomputed -- On Daydreamer pass: after LTP or LTD, recompute salience and run a promotion sweep - -#### C3. Add tests for promotion and eviction - -- Promotion during bootstrap fills to H(t) -- Promotion in steady state evicts the weakest resident -- Community quotas prevent topic collapse -- Tier quotas prevent one hierarchy level from dominating -- Eviction is deterministic under the same state - ---- - -### Phase D - Hierarchical Quota Integration - -#### D1. Upgrade hippocampus/HierarchyBuilder.ts - -After building Books, Volumes, and Shelves, compute the medoid or prototype for each and attempt hotpath admission: - -- Book medoid -> page-tier quota -- Volume prototypes -> volume-tier quota -- Shelf routing prototypes -> shelf-tier quota - -If a tier is full, evict the weakest-salience entry in that tier. - -#### D2. Upgrade cortex/Ranking.ts - -The ranking cascade should search the resident hotpath first: - -- Hot shelves first -- Then hot volumes -- Then hot books -- Then hot pages - -Only spill to warm or cold lookup when resident coverage is insufficient. This makes H(t) the primary latency-control mechanism. - -#### D3. Apply the bound to per-level fanout - -Max children per hierarchy node should also respect a Williams-derived limit: - -- Max volumes per shelf: O(sqrt(|volumes| * log |volumes|)) -- Max books per volume: O(sqrt(|books_in_volume| * log |books_in_volume|)) - -When exceeded, trigger a split through HierarchyBuilder or ClusterStability. - ---- - -### Phase E - Graph-Community Quota Integration - -#### E1. Add community detection to Daydreamer - -Use lightweight label propagation on the Metroid neighbor graph during idle passes. Store community labels in page activity metadata or a dedicated community-label store. Rerun when dirty-volume flags indicate meaningful structural change. - -#### E2. Wire community labels into promotion - -- If a community has remaining quota, promote freely. -- If a community is at quota, the candidate must beat the weakest resident in that community. -- If the community is unknown, place the node into a temporary pending pool that borrows from the page-tier budget. - -#### E3. Add community-aware eviction tests - -- Dense communities do not consume all slots -- New communities get at least one slot -- Empty communities release their slots - ---- - -### Phase F - Metroid Maintenance Under the Bound - -#### F1. Upgrade hippocampus/FastMetroidInsert.ts - -- Derive max neighbors per page from H(t) or a related hotpath policy constant instead of hardcoded K -- If a page is already at max degree, evict the neighbor with the lowest Hebbian edge weight -- After insertion, check whether the new page qualifies for hotpath admission - -#### F2. Upgrade daydreamer/FullMetroidRecalc.ts - -- Bound dirty-volume recalc batch size by an H(t)-derived maintenance budget -- Process at most O(sqrt(t log t)) pairwise comparisons per idle cycle -- Prioritize dirtiest volumes first -- Recompute salience for affected nodes and run a promotion sweep after recalculation - -#### F3. Upgrade daydreamer/HebbianUpdater.ts - -- After LTP or LTD, recompute sigma(v) for all nodes whose incident edges changed -- Run a promotion and eviction sweep for changed nodes -- Prune edges whose weight falls below threshold while keeping Metroid degree within bounds - -#### F4. Upgrade daydreamer/PrototypeRecomputer.ts - -- After recomputing volume or shelf prototypes, recompute salience for affected representative entries -- Run tier-quota promotion or eviction for volume and shelf tiers - ---- - -### Phase G - Retrieval Path Under the Bound - -#### G1. Upgrade cortex/Query.ts - -Full query flow: - -1. Embed query -2. Score against resident shelf prototypes -3. Score against resident volume prototypes within top shelves -4. Score against resident book medoids within top volumes -5. Score against resident pages within top books -6. Expand subgraph via getInducedMetroidSubgraph(seeds, maxHops) -7. Solve coherent path via OpenTSPSolver -8. Return result with provenance - -The key constraint is that steps 2 through 5 operate on the resident set of size H(t), not the full corpus. Step 6 may touch warm or cold storage but remains bounded by maxHops and degree limits derived from the same policy. - -Add a query cost meter that counts vector operations. If cost exceeds a Williams-derived budget, early-stop and return best-so-far. - -#### G2. Apply the bound to subgraph expansion - -Replace the fixed <30 node target with a dynamic bound: - -- maxSubgraphSize = min(30, floor(sqrt(t * log2(1 + t)) / log2(t))) -- maxHops = ceil(log2(log2(1 + t))) -- perHopBranching = floor(maxSubgraphSize^(1 / maxHops)) - -These formulas shrink gracefully as the graph grows and keep expansion cost sublinear. - ---- - -### Phase H - Verification and Benchmarks - -#### H1. Unit tests per phase - -- HotpathPolicy tests for capacity, quotas, and salience -- SalienceEngine tests for promotion, eviction, and determinism -- Hierarchy quota tests for tier budgets, fanout bounds, and spill behavior -- Community quota tests for label propagation, proportional allocation, and minimum guarantees -- Metroid tests for bounded degree and maintenance batch limits -- Query tests for cost metering and subgraph size bounds - -#### H2. Scaling benchmarks - -Add tests/benchmarks/HotpathScaling.bench.ts with synthetic graphs at 1K, 10K, 100K, and 1M node-plus-edge counts. - -Measure: - -- resident set size vs H(t) -- query latency vs corpus size -- promotion and eviction throughput - -Assert: - -- resident count never exceeds H(t) -- query cost scales sublinearly - -#### H3. Guard extension - -Treat c and the quota ratios as policy-derived, not model-derived. Keep them in core/HotpathPolicy.ts and consider adding a separate guard or lint rule to prevent hotpath constants from being hardcoded elsewhere. - -#### H4. CI gate commands - -- npm run guard:model-derived -- npm run build -- npm run lint -- npm run test:unit -- npm run benchmark -- npm run test:browser -- npm run test:electron - ---- - -### Relevant Files - -- DESIGN.md for theorem mapping, three-zone model, salience, quotas, fanout, and subgraph bounds -- PLAN.md for rescoping Hippocampus, Cortex, and Daydreamer around the hotpath lifecycle -- TODO.md for concrete tasks covering HotpathPolicy, SalienceEngine, community detection, and upgrades to ingest, retrieval, and maintenance -- core/types.ts for PageActivity, HotpathEntry, and MetadataStore hotpath methods -- core/HotpathPolicy.ts for central hotpath policy -- core/SalienceEngine.ts for per-node salience and promotion logic -- storage/IndexedDbMetadataStore.ts for hotpath persistence and resident metadata -- Policy.ts for interaction points with routing policy -- core/ModelDefaults.ts remains unchanged and separate from hotpath policy -- hippocampus/FastMetroidInsert.ts for bounded degree and hotpath admission -- hippocampus/HierarchyBuilder.ts for medoid admission and fanout bounds -- cortex/Query.ts for resident-first retrieval and dynamic query limits -- cortex/Ranking.ts for hot, warm, and cold spill logic -- daydreamer/HebbianUpdater.ts for post-LTP or LTD salience recomputation and promotion sweeps -- daydreamer/FullMetroidRecalc.ts for bounded maintenance batches and salience-aware recalculation -- daydreamer/PrototypeRecomputer.ts for tier-quota promotion after prototype updates -- daydreamer/ClusterStability.ts for community detection and split or merge triggers -- tests/Persistence.test.ts for hotpath persistence and bounded graph behavior -- tests/benchmarks/HotpathScaling.bench.ts for scaling validation - ---- - -### Decisions - -- t = |V| + |E| (pages + all edge types) -- H(t) = ceil(c * sqrt(t * log2(1 + t))) -- c is empirically tuned, not theorem-given -- sigma(v) = alpha * H_in(v) + beta * R(v) + gamma * Q(v) -- Default salience weights: alpha = 0.5, beta = 0.3, gamma = 0.2 -- Tier quotas: Shelf 10%, Volume 20%, Book 20%, Page 50% -- Community quotas: proportional to community size with a minimum of 1 slot -- Bootstrap rule: fill the hotpath greedily by salience until H(t) -- Steady-state rule: promote only if candidate salience exceeds the weakest resident in the same tier and community bucket -- Preserve the existing 4-level hierarchy, but bound fanout using Williams-derived limits and trigger split or merge through ClusterStability -- Keep model-derived numerics entirely separate from hotpath policy -- Apply the bound wherever space-time tradeoffs exist: resident index size, per-tier fanout, subgraph expansion, Metroid degree, and Daydreamer batch size - ---- - -### Dependency Graph - -A1 theorem docs -A2 salience definition -A3 tier quotas -A4 community quotas - -> B1 HotpathPolicy - -> B2 HotpathPolicy tests - -> B3 core types extension - -> B4 IndexedDB extension - -> C1 SalienceEngine - -> C2 promotion lifecycle - -> C3 promotion tests - -> D1-D3 hierarchy integration - -> E1-E3 community integration - -> F1-F4 Metroid maintenance integration - -> G1-G2 retrieval integration - -> H1-H4 verification and benchmarks - -D, E, and F can proceed in parallel once the policy and salience foundations are in place. Retrieval depends on hierarchy and community integration. Verification runs continuously. diff --git a/PLAN.md b/PLAN.md index 68c7f3c..b480572 100644 --- a/PLAN.md +++ b/PLAN.md @@ -1,7 +1,7 @@ # CORTEX Implementation Plan -**Version:** 1.0 -**Last Updated:** 2026-03-12 +**Version:** 1.1 +**Last Updated:** 2026-03-13 This document tracks the implementation status of each major module in CORTEX. It shows what's been built, what's in progress, and what remains. @@ -20,12 +20,14 @@ This document tracks the implementation status of each major module in CORTEX. I | Module | Status | Files | Notes | |--------|--------|-------|-------| -| Core Types | ✅ Complete | `core/types.ts` | All entity interfaces defined (Page, Book, Volume, Shelf, Edge, MetroidNeighbor, storage interfaces) | +| Core Types | 🟡 Partial | `core/types.ts` | All entity interfaces defined; needs `PageActivity`, `HotpathEntry`, `TierQuotas`, and `MetadataStore` hotpath method signatures | | Model Profiles | ✅ Complete | `core/ModelProfile.ts`, `core/ModelDefaults.ts`, `core/ModelProfileResolver.ts`, `core/BuiltInModelProfiles.ts` | Source-of-truth for model-derived numerics; guard script enforces compliance | | Numeric Constants | ✅ Complete | `core/NumericConstants.ts` | Runtime constants (byte sizes, workgroup limits) centralized | | Crypto Helpers | ✅ Complete | `core/crypto/hash.ts`, `core/crypto/sign.ts`, `core/crypto/verify.ts` | SHA-256 hashing; Ed25519 sign/verify; 26 tests passing | +| Hotpath Policy | ❌ Missing | `core/HotpathPolicy.ts` (planned) | Central Williams Bound policy: `computeCapacity`, `computeSalience`, `deriveTierQuotas`, `deriveCommunityQuotas`; all policy constants (c, α, β, γ, tier quotas) as frozen default policy object | +| Salience Engine | ❌ Missing | `core/SalienceEngine.ts` (planned) | Per-node salience computation, promotion/eviction lifecycle helpers, community-aware admission logic | -**Foundation Status:** 4/4 complete (100%) +**Foundation Status:** 4/6 complete (67%) --- @@ -35,9 +37,9 @@ This document tracks the implementation status of each major module in CORTEX. I |--------|--------|-------|-------| | Vector Store (OPFS) | ✅ Complete | `storage/OPFSVectorStore.ts` | Append-only binary vector file; byte-offset addressing; test coverage via `tests/Persistence.test.ts` | | Vector Store (Memory) | ✅ Complete | `storage/MemoryVectorStore.ts` | In-memory implementation for testing | -| Metadata Store (IndexedDB) | ✅ Complete | `storage/IndexedDbMetadataStore.ts` | Full CRUD for all entities; reverse indexes; Metroid neighbor operations; dirty-volume flags; test coverage via `tests/Persistence.test.ts` | +| Metadata Store (IndexedDB) | 🟡 Partial | `storage/IndexedDbMetadataStore.ts` | Full CRUD for all entities; reverse indexes; Metroid neighbor operations; dirty-volume flags; needs `hotpath_index` and `page_activity` object stores; hotpath CRUD methods; test coverage via `tests/Persistence.test.ts` | -**Storage Status:** 3/3 complete (100%) +**Storage Status:** 2.5/3 complete (83%) --- @@ -77,9 +79,9 @@ This document tracks the implementation status of each major module in CORTEX. I |--------|--------|-------|-------| | Text Chunking | ❌ Missing | `hippocampus/Chunker.ts` (planned) | Token-aware page boundary detection respecting ModelProfile limits | | Page ID Generation | ❌ Missing | `hippocampus/PageIdGenerator.ts` (planned) | Deterministic hash-based ID creation | -| Ingest Orchestrator | ❌ Missing | `hippocampus/Ingest.ts` (planned) | Main entry point: chunk → embed → persist → build hierarchy → fast neighbor insert | -| Hierarchy Builder | ❌ Missing | `hippocampus/HierarchyBuilder.ts` (planned) | Construct/update Books, Volumes, Shelves from new Pages | -| Fast Neighbor Insert | ❌ Missing | `hippocampus/FastMetroidInsert.ts` (planned) | Incremental Metroid neighbor update (avoid full recalc) | +| Ingest Orchestrator | ❌ Missing | `hippocampus/Ingest.ts` (planned) | Main entry point: chunk → embed → persist → initialise PageActivity → build hierarchy → fast neighbor insert → hotpath admission | +| Hierarchy Builder | ❌ Missing | `hippocampus/HierarchyBuilder.ts` (planned) | Construct/update Books, Volumes, Shelves; attempt tier-quota hotpath admission for each level's medoid/prototype; Williams-derived fanout bounds; trigger split via ClusterStability when bounds exceeded | +| Fast Neighbor Insert | ❌ Missing | `hippocampus/FastMetroidInsert.ts` (planned) | Incremental Metroid neighbor update; max degree derived from HotpathPolicy (not hardcoded K); evict lowest-weight neighbor on degree overflow; check new page for hotpath admission | **Hippocampus Status:** 0/5 complete (0%) @@ -91,12 +93,12 @@ This document tracks the implementation status of each major module in CORTEX. I | Module | Status | Files | Notes | |--------|--------|-------|-------| -| Ranking Pipeline | ❌ Missing | `cortex/Ranking.ts` (planned) | Shelf → Volume → Book → Page hierarchical scoring | -| Seed Selection | ❌ Missing | `cortex/SeedSelection.ts` (planned) | Threshold-based top-k page selection | -| Subgraph Expansion | 🟡 Partial | `storage/IndexedDbMetadataStore.ts` (`getInducedMetroidSubgraph`) | BFS expansion implemented in storage layer; needs orchestration wrapper | +| Ranking Pipeline | ❌ Missing | `cortex/Ranking.ts` (planned) | Resident-first scoring cascade: HOT shelves → HOT volumes → HOT books → HOT pages; spill to WARM/COLD only when coverage insufficient | +| Seed Selection | ❌ Missing | `cortex/SeedSelection.ts` (planned) | Threshold-based top-k page selection from ranking output | +| Subgraph Expansion | 🟡 Partial | `storage/IndexedDbMetadataStore.ts` (`getInducedMetroidSubgraph`) | BFS expansion implemented in storage layer; needs dynamic Williams bounds; needs orchestration wrapper | | Open TSP Solver | ❌ Missing | `cortex/OpenTSPSolver.ts` (planned) | Dummy-node open-path heuristic for coherent ordering | -| Query Orchestrator | ❌ Missing | `cortex/Query.ts` (planned) | Main entry point: embed query → rank → expand → solve path → return result | -| Result DTO | ❌ Missing | `cortex/QueryResult.ts` (planned) | Structured query result with provenance metadata | +| Query Orchestrator | ❌ Missing | `cortex/Query.ts` (planned) | Main entry point: embed → resident-first ranking → subgraph expansion with dynamic bounds → TSP path → query cost meter → early-stop; return result | +| Result DTO | ❌ Missing | `cortex/QueryResult.ts` (planned) | Structured query result with provenance metadata (coherence path, subgraph size, hop count, edge weights) | **Cortex Status:** 0.5/6 complete (8%) @@ -109,15 +111,15 @@ This document tracks the implementation status of each major module in CORTEX. I | Module | Status | Files | Notes | |--------|--------|-------|-------| | Idle Scheduler | ❌ Missing | `daydreamer/IdleScheduler.ts` (planned) | Cooperative background loop; interruptible; respects CPU budget | -| Hebbian Updates | ❌ Missing | `daydreamer/HebbianUpdater.ts` (planned) | LTP (strengthen), LTD (decay), prune below threshold | -| Prototype Recomputation | ❌ Missing | `daydreamer/PrototypeRecomputer.ts` (planned) | Recalculate volume/shelf medoids and centroids | -| Full Metroid Recalc | ❌ Missing | `daydreamer/FullMetroidRecalc.ts` (planned) | Rebuild bounded neighbor lists for dirty volumes | +| Hebbian Updates | ❌ Missing | `daydreamer/HebbianUpdater.ts` (planned) | LTP (strengthen), LTD (decay), prune below threshold; recompute σ(v) for changed nodes; run promotion/eviction sweep | +| Prototype Recomputation | ❌ Missing | `daydreamer/PrototypeRecomputer.ts` (planned) | Recalculate volume/shelf medoids and centroids; recompute salience for affected entries; run tier-quota promotion/eviction | +| Full Metroid Recalc | ❌ Missing | `daydreamer/FullMetroidRecalc.ts` (planned) | Rebuild bounded neighbor lists for dirty volumes; batch size bounded by O(√(t log t)) per idle cycle; recompute salience after recalc | | Experience Replay | ❌ Missing | `daydreamer/ExperienceReplay.ts` (planned) | Simulate queries to reinforce connections | -| Cluster Stability | ❌ Missing | `daydreamer/ClusterStability.ts` (planned) | Detect/trigger split/merge for unstable clusters | +| Cluster Stability | ❌ Missing | `daydreamer/ClusterStability.ts` (planned) | Detect/trigger split/merge for unstable clusters; run lightweight label propagation for community detection; store community labels in PageActivity | **Daydreamer Status:** 0/6 complete (0%) -**Note:** Not a v1 blocker — system can ship without background consolidation (manual recalc only). +**Note:** Not a v1 blocker — system can ship without background consolidation (manual recalc only). Community detection is required before graph-community quota enforcement is active. --- @@ -126,8 +128,9 @@ This document tracks the implementation status of each major module in CORTEX. I | Module | Status | Files | Notes | |--------|--------|-------|-------| | Routing Policy | ✅ Complete | `Policy.ts` | Derives routing dimensions from ModelProfile; integration tested | +| Hotpath Policy | ❌ Missing | `core/HotpathPolicy.ts` (planned) | Williams Bound capacity formula, salience weights, tier quotas, community quotas; separate from model-derived numerics | -**Policy Status:** 1/1 complete (100%) +**Policy Status:** 1/2 complete (50%) --- @@ -149,15 +152,18 @@ This document tracks the implementation status of each major module in CORTEX. I | Module | Status | Files | Notes | |--------|--------|-------|-------| | Unit Tests | ✅ Complete | `tests/*.test.ts`, `tests/**/*.test.ts` | 115 tests across 13 files; all passing | -| Persistence Tests | ✅ Complete | `tests/Persistence.test.ts` | Full storage layer coverage (OPFS, IndexedDB, Metroid neighbors) | +| Persistence Tests | ✅ Complete | `tests/Persistence.test.ts` | Full storage layer coverage (OPFS, IndexedDB, Metroid neighbors); needs extension for hotpath stores | | Model Tests | ✅ Complete | `tests/model/*.test.ts` | Profile resolution, defaults, routing policy | | Embedding Tests | ✅ Complete | `tests/embeddings/*.test.ts` | Provider resolver, runner, real/dummy backends | | Backend Smoke Tests | ✅ Complete | `tests/BackendSmoke.test.ts` | All vector backends instantiate cleanly | | Runtime Tests | ✅ Complete | `tests/runtime/*.spec.mjs` | Browser harness validated; Electron context-sensitive | | Integration Tests | ❌ Missing | `tests/integration/*.test.ts` (planned) | End-to-end: ingest → persist → query → coherent result | -| Benchmarks | 🟡 Partial | `tests/benchmarks/DummyEmbedderHotpath.bench.ts` | Baseline dummy embedder benchmark; real-provider benchmarks needed | +| Hotpath Policy Tests | ❌ Missing | `tests/HotpathPolicy.test.ts` (planned) | H(t) sublinearity and monotonicity; tier quota sums; community quota minimums; salience determinism | +| Salience Engine Tests | ❌ Missing | `tests/SalienceEngine.test.ts` (planned) | Bootstrap fills to H(t); steady-state eviction; community/tier quota enforcement; determinism | +| Scaling Benchmarks | ❌ Missing | `tests/benchmarks/HotpathScaling.bench.ts` (planned) | Synthetic graphs at 1K/10K/100K/1M; assert resident count ≤ H(t); query cost sublinear | +| Benchmarks | 🟡 Partial | `tests/benchmarks/DummyEmbedderHotpath.bench.ts` | Baseline dummy embedder benchmark; real-provider and hotpath scaling benchmarks needed | -**Testing Status:** 7/9 complete (78%) +**Testing Status:** 6/12 complete (50%) --- @@ -180,19 +186,19 @@ This document tracks the implementation status of each major module in CORTEX. I | Layer | Completion | Critical Gap | |-------|-----------|--------------| -| Foundation | 100% | — | -| Storage | 100% | — | +| Foundation | 67% | HotpathPolicy, SalienceEngine, type extensions | +| Storage | 83% | Hotpath IndexedDB stores | | Vector Compute | 100% | — | | Embedding | 83% | WebGL provider (low priority) | | Hippocampus | 0% | **CRITICAL** — No ingest path | | Cortex | 8% | **CRITICAL** — No retrieval path | | Daydreamer | 0% | Not v1 blocker | -| Policy | 100% | — | +| Policy | 50% | HotpathPolicy | | Runtime | 100% | — | -| Testing | 78% | Integration tests | +| Testing | 50% | Integration tests, hotpath tests, scaling benchmarks | | Build/CI | 83% | — | -**System-Wide Completion:** ~60% (infrastructure complete; orchestration layers missing) +**System-Wide Completion:** ~55% (infrastructure complete; Williams Bound policy foundation and orchestration layers missing) --- @@ -218,83 +224,101 @@ This document tracks the implementation status of each major module in CORTEX. I ### Phase 1: Unblock Basic Functionality (Ship v0.1) -**Goal:** Enable ingest and retrieval for a single user session. +**Goal:** Enable ingest and retrieval for a single user session, with Williams Bound policy foundation in place. 1. **Crypto Helpers** (`core/crypto/*`) ✅ **Complete** - SHA-256 hashing for text and binary - Ed25519 signing/verification - 26 tests passing -2. **Text Chunking** (`hippocampus/Chunker.ts`) +2. **Williams Bound Policy Foundation** + - `core/HotpathPolicy.ts` — `computeCapacity`, `computeSalience`, `deriveTierQuotas`, `deriveCommunityQuotas`; all constants as frozen default policy object + - `core/SalienceEngine.ts` — `computeNodeSalience`, `batchComputeSalience`, `shouldPromote`, `selectEvictionTarget`; bootstrap and steady-state lifecycle + - Extend `core/types.ts` — `PageActivity`, `HotpathEntry`, `TierQuotas`, `MetadataStore` hotpath method signatures + - Extend `storage/IndexedDbMetadataStore.ts` — `hotpath_index` and `page_activity` object stores; implement new `MetadataStore` hotpath methods + - Tests: `tests/HotpathPolicy.test.ts`, `tests/SalienceEngine.test.ts`, extend `tests/Persistence.test.ts` + +3. **Text Chunking** (`hippocampus/Chunker.ts`) - Token-aware splitting respecting ModelProfile limits - Preserve sentence boundaries where possible - Test with various text lengths -3. **Hippocampus Ingest** (`hippocampus/Ingest.ts`) +4. **Hippocampus Ingest** (`hippocampus/Ingest.ts`) - Chunk → Embed → Persist orchestration - - Build Page entities with proper hashing/signing + - Build Page entities with proper hashing/signing; initialise `PageActivity` record - Single-Book hierarchy (defer Volume/Shelf) - - Basic Metroid neighbor insertion (K-nearest) + - Basic Metroid neighbor insertion with Williams-bounded degree -4. **Cortex Query** (`cortex/Query.ts`) +5. **Cortex Query** (`cortex/Query.ts`) - Embed query - - Flat page ranking (skip hierarchy for now) + - Flat page ranking against resident hotpath (skip full hierarchy for now) - Return top-K pages by similarity - Skip TSP coherence path (just ranked list) -5. **Integration Test** (`tests/integration/IngestQuery.test.ts`) +6. **Integration Test** (`tests/integration/IngestQuery.test.ts`) - Ingest text → Retrieve by query → Validate results - Persistence across sessions -**Exit Criteria:** User can ingest text and retrieve relevant pages by query. +**Exit Criteria:** User can ingest text and retrieve relevant pages by query; Williams Bound policy is in place. --- -### Phase 2: Add Hierarchy & Coherence (Ship v0.5) +### Phase 2: Add Hierarchy, Coherence & Resident-First Routing (Ship v0.5) -**Goal:** Hierarchical routing and coherent path ordering. +**Goal:** Hierarchical routing, coherent path ordering, and fully resident-first query path. 1. **Hierarchy Builder** (`hippocampus/HierarchyBuilder.ts`) - Cluster pages into Books (medoid selection) - Cluster books into Volumes (prototype computation) - Build Shelves for coarse routing + - Attempt tier-quota hotpath admission for each level's medoid/prototype via `SalienceEngine` + - Williams-derived fanout bounds; trigger split via `ClusterStability` when exceeded 2. **Ranking Pipeline** (`cortex/Ranking.ts`) - - Shelf → Volume → Book → Page cascade + - Resident-first cascade: HOT shelves → HOT volumes → HOT books → HOT pages + - Spill to WARM/COLD only when resident coverage insufficient 3. **Open TSP Solver** (`cortex/OpenTSPSolver.ts`) - Dummy-node open-path heuristic - Test on synthetic graphs 4. **Full Query Orchestrator** (`cortex/Query.ts` — upgrade) - - Hierarchical ranking - - Subgraph expansion + - Resident-first hierarchical ranking + - Dynamic subgraph expansion bounds from `HotpathPolicy` + - Query cost meter; early-stop on budget exceeded - Coherent path via TSP - Rich result DTO with provenance -**Exit Criteria:** User gets coherent ordered context chains, not just similarity-ranked pages. +**Exit Criteria:** User gets coherent ordered context chains through the resident hotpath; query latency controlled by H(t). --- -### Phase 3: Background Consolidation (Ship v1.0) +### Phase 3: Background Consolidation & Community Quotas (Ship v1.0) -**Goal:** Idle maintenance keeps memory healthy. +**Goal:** Idle maintenance keeps memory healthy; community-aware hotpath coverage active. 1. **Idle Scheduler** (`daydreamer/IdleScheduler.ts`) - Cooperative, interruptible loop - CPU budget awareness 2. **Hebbian Updater** (`daydreamer/HebbianUpdater.ts`) - - LTP/LTD rules - - Edge pruning + - LTP/LTD rules; edge pruning + - Recompute σ(v) for changed nodes; run promotion/eviction sweep 3. **Full Metroid Recalc** (`daydreamer/FullMetroidRecalc.ts`) - Rebuild neighbor lists for dirty volumes + - O(√(t log t)) batch size per idle cycle 4. **Prototype Recomputer** (`daydreamer/PrototypeRecomputer.ts`) - Update volume/shelf prototypes + - Tier-quota promotion/eviction after recomputation + +5. **Community Detection** (`daydreamer/ClusterStability.ts` — extend) + - Label propagation on Metroid neighbor graph + - Store community labels in `PageActivity.communityId` + - Wire community IDs into `SalienceEngine` promotion/eviction -**Exit Criteria:** System self-maintains over extended use; no manual intervention required. +**Exit Criteria:** System self-maintains over extended use; community-aware hotpath quotas enforced. --- @@ -307,10 +331,10 @@ This document tracks the implementation status of each major module in CORTEX. I - Edge cases and error paths - Performance regression tests -2. **Benchmark Suite** (`tests/benchmarks/*`) - - Real-provider throughput - - Query latency across corpus sizes - - Storage overhead +2. **Scaling Benchmark Suite** (`tests/benchmarks/HotpathScaling.bench.ts`) + - Synthetic graphs at 1K, 10K, 100K, 1M nodes+edges + - Assert: resident count never exceeds H(t); query cost scales sublinearly + - Record baselines in `benchmarks/BASELINES.md` 3. **Documentation** (`docs/*`) - API reference @@ -319,7 +343,7 @@ This document tracks the implementation status of each major module in CORTEX. I 4. **CI Hardening** - Electron runtime gate policy - - Guard scripts in merge checks + - Guard scripts in merge checks (model-derived + hotpath policy) - Benchmark baselines **Exit Criteria:** All tests pass; benchmarks recorded; docs complete; ready for public use. @@ -334,11 +358,15 @@ This document tracks the implementation status of each major module in CORTEX. I ### Blocker 2: No Query Orchestration **Impact:** Cannot retrieve memories. -**Mitigation:** Phase 1 priority; flat ranking acceptable for v0.1. +**Mitigation:** Phase 1 priority; flat ranking against resident hotpath acceptable for v0.1. + +### Blocker 3: No HotpathPolicy or SalienceEngine +**Impact:** Cannot enforce Williams Bound invariants; all subsequent phases depend on these. +**Mitigation:** Phase 1 priority; implement before ingest/query orchestration. ### Risk 1: TSP Complexity Open TSP is NP-hard; heuristic may be slow on large subgraphs. -**Mitigation:** Bound subgraph size (<30 nodes); defer to Phase 2; use deterministic greedy heuristic. +**Mitigation:** Dynamic Williams-derived subgraph bounds shrink the problem as graph grows; defer to Phase 2; use deterministic greedy heuristic. ### Risk 2: Electron Runtime Stability Host-shell Electron can SIGSEGV in constrained contexts. @@ -348,6 +376,10 @@ Host-shell Electron can SIGSEGV in constrained contexts. Transformers.js doesn't expose `webgl` device directly. **Mitigation:** Low priority; `webgpu` and `wasm` sufficient for most users; explicit ORT adapter deferred to Phase 4. +### Risk 4: Empirical Calibration of `c` +The Williams Bound scaling constant `c` is not theorem-given; wrong value causes either hotpath over-allocation (wastes RAM) or under-allocation (defeats purpose). +**Mitigation:** Default `c = 0.5` is conservative; scaling benchmarks in Phase 4 will validate and tune. Keep `c` in `core/HotpathPolicy.ts` as an overrideable policy constant. + --- ## Development Workflow @@ -415,5 +447,7 @@ After every implementation pass: - **Metroid vs medoid:** Use `Metroid` in all API surfaces and docs; `medoid` only in algorithmic comments. - **Model-derived numerics:** Never hardcode; always source from `core/` model profile modules. +- **Policy-derived constants:** Never hardcode; always source from `core/HotpathPolicy.ts`. - **Test philosophy:** TDD (Red → Green → Refactor) for all new slices. - **Runtime realism:** Browser and Electron lanes are required merge gates. +- **Williams Bound invariant:** The resident hotpath count must never exceed H(t). Enforce in tests and assert in benchmarks. diff --git a/README.md b/README.md index afc2af8..8179a7a 100644 --- a/README.md +++ b/README.md @@ -54,9 +54,11 @@ This is the "dreaming" phase that prevents catastrophic forgetting and forces ab ## Core Design Principles -- **Biological Scarcity** — Only a fixed number of active prototypes live in VRAM. Everything else is gracefully demoted to disk. -- **Hierarchical & Sparse** — Progressive dimensionality reduction + medoid clustering keeps memory efficient at any scale. -- **Hebbian & Dynamic** — Connections strengthen and weaken naturally. +- **Biological Scarcity** — Only a fixed number of active prototypes live in memory. Everything else is gracefully demoted to disk. +- **Sublinear Growth (Williams Bound)** — The resident hotpath index is bounded to H(t) = ⌈c·√(t·log₂(1+t))⌉ where t = total graph mass (pages + edges). Memory scales sublinearly as the graph grows, trading time for space at a mathematically principled rate. See [`DESIGN.md`](DESIGN.md) for the full theorem mapping. +- **Three-Zone Memory** — HOT (resident in-memory index, capacity H(t)), WARM (indexed in IndexedDB), COLD (raw bytes in OPFS only). All data is retained locally; zones control lookup cost, not data lifetime. +- **Hierarchical & Sparse** — Progressive dimensionality reduction + medoid clustering keeps memory efficient at any scale, with Williams-derived fanout bounds preventing any single tier from monopolising the index. +- **Hebbian & Dynamic** — Connections strengthen and weaken naturally. Node salience (σ = α·H_in + β·R + γ·Q) drives promotion into and eviction from the resident hotpath. - **Zero-Copy & Persistent** — OPFS + IndexedDB with cryptographic signing. ## Quick Start diff --git a/TODO.md b/TODO.md index da90b11..7e4ab9c 100644 --- a/TODO.md +++ b/TODO.md @@ -1,6 +1,6 @@ # CORTEX TODO — Path to v1.0 -**Last Updated:** 2026-03-12 +**Last Updated:** 2026-03-13 This document contains a prioritized, actionable list of specific tasks required to ship CORTEX v1.0. Items are ordered by dependency: highest-priority items are those blocking other work. @@ -37,6 +37,76 @@ These items **must** be completed to have a usable system. Without them, users c --- +### P0-F: Williams Bound Policy Foundation (BLOCKS: all hotpath-aware modules) + +**Why:** The HotpathPolicy and SalienceEngine are the central source of truth for the Williams Bound architecture. Every subsequent module (ingest, query, hierarchy, Daydreamer) depends on them. Implementing these first ensures the bound is enforced from day one rather than retrofitted. + +- [ ] **P0-F1:** Implement `core/HotpathPolicy.ts` + - `computeCapacity(graphMass: number): number` — H(t) = ⌈c · √(t · log₂(1+t))⌉ + - `computeSalience(hebbianIn: number, recency: number, queryHits: number, weights?: SalienceWeights): number` — σ = α·H_in + β·R + γ·Q + - `deriveTierQuotas(capacity: number, quotaRatios?: TierQuotaRatios): TierQuotas` — allocate H(t) across shelf/volume/book/page tiers + - `deriveCommunityQuotas(tierBudget: number, communitySizes: number[]): number[]` — proportional with min(1) guarantee + - Export a frozen `DEFAULT_HOTPATH_POLICY` object containing all constants: `c = 0.5`, `α = 0.5`, `β = 0.3`, `γ = 0.2`, `q_s = 0.10`, `q_v = 0.20`, `q_b = 0.20`, `q_p = 0.50` + - Keep strictly separate from `core/ModelDefaults.ts` (policy-derived ≠ model-derived) + +- [ ] **P0-F2:** Add HotpathPolicy test coverage (`tests/HotpathPolicy.test.ts`) + - H(t) grows sublinearly: verify `H(10_000) / 10_000 < H(1_000) / 1_000` + - H(t) is monotonically non-decreasing: verify H(t+1) ≥ H(t) for all t + - Tier quotas sum exactly to capacity: `q_s + q_v + q_b + q_p === 1.0` + - Community quotas sum to tier budget and each slot ≥ 1 + - Salience is deterministic for same inputs + +- [ ] **P0-F3:** Extend `core/types.ts` + - Add `PageActivity` interface: `{ pageId: Hash; queryHitCount: number; lastQueryAt: string; communityId?: string }` + - Add `HotpathEntry` interface: `{ entityId: Hash; tier: 'shelf' | 'volume' | 'book' | 'page'; salience: number; communityId?: string }` + - Add `TierQuotas` type: `{ shelf: number; volume: number; book: number; page: number }` + - Add hotpath method signatures to `MetadataStore` interface: + - `putHotpathEntry(entry: HotpathEntry): Promise` + - `getHotpathEntries(tier?: HotpathEntry['tier']): Promise` + - `evictWeakest(tier: HotpathEntry['tier'], communityId?: string): Promise` + - `getResidentCount(): Promise` + - `putPageActivity(activity: PageActivity): Promise` + - `getPageActivity(pageId: Hash): Promise` + +- [ ] **P0-F4:** Extend `storage/IndexedDbMetadataStore.ts` + - Add `hotpath_index` object store keyed by `entityId`; secondary index by `tier` + - Add `page_activity` object store keyed by `pageId` + - Implement all six new `MetadataStore` hotpath methods + - Extend `tests/Persistence.test.ts` with hotpath store tests: + - put/get/evict cycle for `HotpathEntry` + - put/get for `PageActivity` + - `getResidentCount` returns correct value after multiple puts + +**Exit Criteria:** `HotpathPolicy` module passes all tests; `types.ts` has hotpath interfaces; IndexedDB hotpath stores are implemented and tested. + +--- + +### P0-G: Salience Engine (BLOCKS: hotpath promotion in ingest and Daydreamer) + +**Why:** The SalienceEngine is the decision-making layer for hotpath admission. It is needed by ingest (new page admission), query (hit-count update), and Daydreamer (post-LTP/LTD sweeps). Implementing it before ingest ensures promotion logic is correct from the first page written. + +- [ ] **P0-G1:** Implement `core/SalienceEngine.ts` + - `computeNodeSalience(pageId: Hash, metadataStore: MetadataStore): Promise` — fetch PageActivity and incident Hebbian edges; apply σ formula via HotpathPolicy + - `batchComputeSalience(pageIds: Hash[], metadataStore: MetadataStore): Promise>` — efficient batch version + - `shouldPromote(candidateSalience: number, weakestResidentSalience: number, capacityRemaining: number): boolean` — admission gating + - `selectEvictionTarget(tier: HotpathEntry['tier'], communityId: string | undefined, metadataStore: MetadataStore): Promise` — find weakest resident in tier/community bucket + +- [ ] **P0-G2:** Implement promotion/eviction lifecycle helpers in `core/SalienceEngine.ts` + - `bootstrapHotpath(metadataStore: MetadataStore, policy: HotpathPolicy): Promise` — fill hotpath greedily by salience while resident count < H(t) + - `runPromotionSweep(candidateIds: Hash[], metadataStore: MetadataStore, policy: HotpathPolicy): Promise` — steady-state: promote if salience > weakest in same tier/community bucket; evict weakest on promotion + +- [ ] **P0-G3:** Add SalienceEngine test coverage (`tests/SalienceEngine.test.ts`) + - Bootstrap fills hotpath to exactly H(t) given enough candidates + - Steady-state promotes only when candidate beats the weakest resident + - Steady-state evicts exactly the weakest resident (not a random entry) + - Community quotas prevent a single community from consuming all page-tier slots + - Tier quotas prevent one hierarchy level from dominating + - Eviction is deterministic under the same state + +**Exit Criteria:** `SalienceEngine` module passes all tests; promotion/eviction lifecycle is correct and deterministic. + +--- + ### P0-B: Text Chunking (BLOCKS: ingest orchestration) **Why:** Must split text into page-sized chunks respecting ModelProfile token limits. @@ -64,23 +134,25 @@ These items **must** be completed to have a usable system. Without them, users c - Generate `pageId`, `contentHash`, `vectorHash` - Sign with provided key pair - Link pages via `prevPageId`/`nextPageId` + - Initialise `PageActivity` records with zero counts - [ ] **P0-C2:** Implement `hippocampus/Ingest.ts` (minimal version) - Entry point: `ingestText(text, modelProfile, vectorStore, metadataStore, keyPair)` - Chunk text via `Chunker` - Batch embed chunks via `EmbeddingRunner` - Persist vectors to `VectorStore` - - Build pages via `PageBuilder` - - Persist pages to `MetadataStore` + - Build pages via `PageBuilder`; persist pages and `PageActivity` to `MetadataStore` - Build single `Book` containing all pages (medoid = first page for now) + - After persisting pages, check each new page for hotpath admission via `SalienceEngine.runPromotionSweep` - **Defer:** Volume/Shelf hierarchy, fast neighbor insert - [ ] **P0-C3:** Add ingest test coverage - `tests/hippocampus/Ingest.test.ts` - Test happy path (text → pages → book) - Test persistence (can retrieve pages after ingest) + - Test that new pages are considered for hotpath admission after ingest -**Exit Criteria:** User can call `ingestText(...)` and pages are persisted. +**Exit Criteria:** User can call `ingestText(...)` and pages are persisted; PageActivity records exist; hotpath admission runs. --- @@ -91,11 +163,11 @@ These items **must** be completed to have a usable system. Without them, users c - [ ] **P0-D1:** Implement `cortex/Query.ts` (minimal version) - Entry point: `query(queryText, modelProfile, vectorStore, metadataStore, topK)` - Embed query via `EmbeddingRunner` - - Load all page embeddings (flat search for now) + - Score resident hotpath entries first (HOT pages); fall back to full scan for WARM/COLD - Compute similarities via `VectorBackend` - - Select top-K pages + - Select top-K pages; increment `queryHitCount` in `PageActivity`; recompute salience; run promotion sweep - Return `QueryResult` with page IDs and scores - - **Defer:** Hierarchical ranking, subgraph expansion, TSP coherence + - **Defer:** Full hierarchical ranking, subgraph expansion, TSP coherence, query cost meter - [ ] **P0-D2:** Implement `cortex/QueryResult.ts` - DTO with `pages: Page[]`, `scores: number[]`, `metadata: object` @@ -105,8 +177,9 @@ These items **must** be completed to have a usable system. Without them, users c - Test happy path (query → top-K pages) - Test empty corpus (no results) - Test relevance (query for known content returns expected pages) + - Test `PageActivity.queryHitCount` incremented after query hit -**Exit Criteria:** User can call `query(...)` and get ranked pages. +**Exit Criteria:** User can call `query(...)` and get ranked pages; query hits update PageActivity and trigger salience recomputation. --- @@ -134,14 +207,18 @@ These items add hierarchical routing and coherent path ordering. They transform ### P1-A: Hierarchy Builder (UNBLOCKS: hierarchical routing) -**Why:** Need Volume and Shelf structures for efficient coarse-to-fine routing. +**Why:** Need Volume and Shelf structures for efficient coarse-to-fine routing. Tier-quota hotpath admission must be integrated so hierarchy prototypes enter the resident index from the moment they are created. - [ ] **P1-A1:** Implement `hippocampus/HierarchyBuilder.ts` - Cluster pages into Books (K-means or similar; select medoid) - Cluster books into Volumes (compute prototype vectors) - Cluster volumes into Shelves (coarse routing prototypes) - - Persist prototypes to `VectorStore` - - Update metadata in `MetadataStore` + - Persist prototypes to `VectorStore`; update metadata in `MetadataStore` + - After each level: attempt hotpath admission via `SalienceEngine.runPromotionSweep`: + - Book medoid → page-tier quota + - Volume prototypes → volume-tier quota + - Shelf routing prototypes → shelf-tier quota + - Enforce Williams-derived fanout bounds (see `HotpathPolicy`); when exceeded, trigger split via `ClusterStability` - [ ] **P1-A2:** Upgrade `hippocampus/Ingest.ts` - After persisting pages, call `HierarchyBuilder` @@ -151,55 +228,63 @@ These items add hierarchical routing and coherent path ordering. They transform - `tests/hippocampus/HierarchyBuilder.test.ts` - Test clustering produces valid Books/Volumes/Shelves - Test prototypes are valid vectors + - Test that hierarchy medoids/prototypes are admitted to correct tier quota + - Test fanout bounds respected; split triggered when exceeded -**Exit Criteria:** Ingestion produces full Page → Book → Volume → Shelf hierarchy. +**Exit Criteria:** Ingestion produces full Page → Book → Volume → Shelf hierarchy with tier-quota hotpath admission at every level. --- ### P1-B: Ranking Pipeline (UNBLOCKS: efficient queries) -**Why:** Hierarchical ranking avoids scanning all pages; reduces query latency. +**Why:** Hierarchical ranking avoids scanning all pages; reduces query latency. The resident hotpath is the primary lookup target — WARM/COLD spill happens only when the hot set provides insufficient coverage. - [ ] **P1-B1:** Implement `cortex/Ranking.ts` - - `rankShelves(queryEmbedding, shelves, topK)` - - `rankVolumes(queryEmbedding, volumes, topK)` - - `rankBooks(queryEmbedding, books, topK)` - - `rankPages(queryEmbedding, pages, topK)` - - Each step narrows search space via prototype similarity + - `rankShelves(queryEmbedding, residentShelves, topK)` — score HOT shelf prototypes first + - `rankVolumes(queryEmbedding, residentVolumes, topK)` — score HOT volume prototypes within top shelves + - `rankBooks(queryEmbedding, residentBooks, topK)` — score HOT book medoids within top volumes + - `rankPages(queryEmbedding, residentPages, topK)` — score HOT page representatives within top books + - `spillToWarm(tier, queryEmbedding, metadataStore, topK)` — spill to IndexedDB lookup when resident set insufficient + - Each step narrows the search space; H(t) is the primary latency lever - [ ] **P1-B2:** Upgrade `cortex/Query.ts` - - Replace flat search with hierarchical ranking cascade - - Shelf → Volume → Book → Page + - Replace flat search with resident-first hierarchical ranking cascade + - HOT shelves → HOT volumes → HOT books → HOT pages → WARM/COLD spill - [ ] **P1-B3:** Add ranking test coverage - `tests/cortex/Ranking.test.ts` - Test each ranking function independently - Test full cascade produces correct top pages + - Test that resident entries are scored before non-resident entries -**Exit Criteria:** Queries use hierarchical routing; latency reduced. +**Exit Criteria:** Queries use resident-first hierarchical routing; latency scales with H(t), not corpus size. --- ### P1-C: Fast Metroid Neighbor Insert (UNBLOCKS: graph coherence) -**Why:** Need sparse NN graph for coherent path tracing. +**Why:** Need sparse NN graph for coherent path tracing. Degree must be bounded by `HotpathPolicy` to prevent unbounded graph mass growth. - [ ] **P1-C1:** Implement `hippocampus/FastMetroidInsert.ts` - For each new page, compute similarity to existing pages - - Select K-nearest neighbors (bounded degree) + - Derive max neighbors per page from `HotpathPolicy` constant (not hardcoded K) - Insert forward edges (page → neighbors) - Insert reverse edges (neighbors → page), respecting max degree - - Mark affected volumes as dirty for full recalc + - If a page is already at max degree, evict the neighbor with the lowest Hebbian edge weight + - Mark affected volumes as dirty for full Daydreamer recalc + - After insertion, check new page for hotpath admission via `SalienceEngine` - [ ] **P1-C2:** Upgrade `hippocampus/Ingest.ts` - After persisting pages, call `FastMetroidInsert` - [ ] **P1-C3:** Add Metroid insert test coverage - `tests/hippocampus/FastMetroidInsert.test.ts` - - Test neighbor lists are bounded + - Test neighbor lists are bounded by the policy-derived max degree - Test symmetry (if A→B, then B→A) + - Test that degree overflow evicts lowest-weight neighbor, not a random one + - Test that new page is considered for hotpath admission after insertion -**Exit Criteria:** Metroid neighbor graph is maintained during ingest. +**Exit Criteria:** Metroid neighbor graph is maintained during ingest with policy-bounded degree. --- @@ -225,39 +310,44 @@ These items add hierarchical routing and coherent path ordering. They transform ### P1-E: Full Query Orchestrator (DELIVERS: coherent retrieval) -**Why:** This is the "aha" moment — return memories in natural narrative order. +**Why:** This is the "aha" moment — return memories in natural narrative order through the resident hotpath with dynamic, sublinear expansion bounds. - [ ] **P1-E1:** Upgrade `cortex/Query.ts` (full version) - - Use hierarchical ranking to select seed pages - - Call `MetadataStore.getInducedMetroidSubgraph(seedPages, maxHops)` + - Use resident-first hierarchical ranking to select seed pages + - Derive dynamic subgraph bounds from `HotpathPolicy` (`maxSubgraphSize`, `maxHops`, `perHopBranching`) + - Call `MetadataStore.getInducedMetroidSubgraph(seedPages, maxHops)` using dynamic `maxHops` - Call `OpenTSPSolver.solve(subgraph)` - Return ordered page list via coherent path - - Include provenance metadata (hop count, edge weights) + - **Query cost meter:** count vector operations; early-stop and return best-so-far if cost exceeds Williams-derived budget + - Include provenance metadata (hop count, edge weights, subgraph size, cost) - [ ] **P1-E2:** Upgrade `cortex/QueryResult.ts` - Add `coherencePath: Hash[]` (ordered page IDs) - - Add `provenance: { subgraphSize, hopCount, edgeWeights }` + - Add `provenance: { subgraphSize: number; hopCount: number; edgeWeights: number[]; vectorOpCost: number; earlyStop: boolean }` - [ ] **P1-E3:** Add full query test coverage - `tests/cortex/Query.test.ts` (upgrade) - - Test subgraph expansion + - Test subgraph expansion stays within `maxSubgraphSize` - Test TSP ordering - Test provenance metadata + - Test early-stop fires when cost budget exceeded -**Exit Criteria:** Queries return coherent ordered context chains, not just ranked pages. +**Exit Criteria:** Queries return coherent ordered context chains through the resident hotpath; dynamic bounds and cost meter active. --- ### P1-F: Integration Test (Hierarchical + Coherent) -**Why:** Validate v0.5 completeness. +**Why:** Validate v0.5 completeness including resident-first routing and dynamic subgraph bounds. - [ ] **P1-F1:** Upgrade `tests/integration/IngestQuery.test.ts` - Verify hierarchical structures exist after ingest - - Verify queries return coherent paths + - Verify hotpath entries exist for hierarchy prototypes after ingest + - Verify queries return coherent paths through resident hotpath + - Verify dynamic subgraph bounds honoured (no expansion beyond `maxSubgraphSize`) - Compare coherent path vs flat ranking (show narrative flow improvement) -**Exit Criteria:** Integration test demonstrates coherent retrieval. +**Exit Criteria:** Integration test demonstrates coherent retrieval with resident-first routing. --- @@ -286,77 +376,114 @@ These items add idle background maintenance. System self-improves over time with ### P2-B: Hebbian Updater (DELIVERS: connection plasticity) -**Why:** Strengthen useful connections, decay unused ones. +**Why:** Strengthen useful connections, decay unused ones. Edge changes alter σ(v) values and can trigger hotpath promotions or evictions. - [ ] **P2-B1:** Implement `daydreamer/HebbianUpdater.ts` - LTP: strengthen edges traversed during successful queries - LTD: decay all edges by small factor each pass - - Prune: remove edges below threshold + - Prune: remove edges below threshold; keep Metroid degree within `HotpathPolicy`-derived bounds + - After LTP/LTD: recompute σ(v) for all nodes whose incident edges changed (via `SalienceEngine.batchComputeSalience`) + - Run promotion/eviction sweep for changed nodes via `SalienceEngine.runPromotionSweep` - Update `MetadataStore.putEdges` - [ ] **P2-B2:** Add Hebbian test coverage - `tests/daydreamer/HebbianUpdater.test.ts` - Test strengthen increases weight - Test decay decreases weight - - Test pruning removes weak edges + - Test pruning removes weak edges and keeps degree within bounds + - Test that salience is recomputed for changed nodes + - Test that promotion sweep runs after LTP increases salience above weakest resident -**Exit Criteria:** Edge weights adapt based on usage. +**Exit Criteria:** Edge weights adapt based on usage; salience and hotpath updated accordingly. --- ### P2-C: Full Metroid Recalc (DELIVERS: graph maintenance) -**Why:** Incremental fast insert is approximate; need periodic full recalc. +**Why:** Incremental fast insert is approximate; need periodic full recalc. Recalc batch size must be bounded by H(t)-derived maintenance budget to avoid blocking the idle loop. - [ ] **P2-C1:** Implement `daydreamer/FullMetroidRecalc.ts` - - Query `MetadataStore.needsMetroidRecalc(volumeId)` for dirty volumes - - Load all pages in volume - - Compute all pairwise similarities - - Select K-nearest for each page (bounded degree) - - Update `MetadataStore.putMetroidNeighbors` + - Query `MetadataStore.needsMetroidRecalc(volumeId)` for dirty volumes; prioritise dirtiest first + - Load all pages in volume; compute pairwise similarities + - Bound batch: process at most `HotpathPolicy.computeCapacity(graphMass)` pairwise comparisons per idle cycle (O(√(t log t))) + - Select policy-derived max neighbors for each page; update `MetadataStore.putMetroidNeighbors` - Clear dirty flag via `MetadataStore.clearMetroidRecalcFlag` + - Recompute σ(v) for affected nodes via `SalienceEngine.batchComputeSalience`; run promotion sweep - [ ] **P2-C2:** Add Metroid recalc test coverage - `tests/daydreamer/FullMetroidRecalc.test.ts` - Test dirty flag cleared after recalc - Test neighbor quality improved vs fast insert + - Test batch size respects O(√(t log t)) limit per cycle + - Test salience recomputed and promotion sweep runs after recalc -**Exit Criteria:** Dirty volumes are recalculated in background. +**Exit Criteria:** Dirty volumes are recalculated in background within bounded compute budget; salience updated. --- ### P2-D: Prototype Recomputer (DELIVERS: prototype quality) -**Why:** Keep volume/shelf prototypes accurate as pages/books change. +**Why:** Keep volume/shelf prototypes accurate as pages/books change. Prototype updates change which entries should occupy the volume and shelf tier quotas. - [ ] **P2-D1:** Implement `daydreamer/PrototypeRecomputer.ts` - Recompute volume medoids (select medoid page per volume) - Recompute volume centroids (average of book embeddings) - Recompute shelf routing prototypes - Update vectors in `VectorStore` (append new, update offsets) + - After recomputing each level: recompute salience for affected representative entries via `SalienceEngine`; run tier-quota promotion/eviction for that tier - [ ] **P2-D2:** Add prototype recomputer test coverage - `tests/daydreamer/PrototypeRecomputer.test.ts` - Test medoid selection algorithm - Test centroid computation + - Test that tier-quota hotpath entries are updated after prototype recomputation -**Exit Criteria:** Prototypes stay accurate over time. +**Exit Criteria:** Prototypes stay accurate over time; tier quota entries reflect current prototypes. --- ### P2-E: Integration Test (Background Consolidation) -**Why:** Validate Daydreamer improves system health. +**Why:** Validate Daydreamer improves system health and hotpath stays consistent. - [ ] **P2-E1:** Implement `tests/integration/Daydreamer.test.ts` - Ingest corpus - - Run queries (generate edge traversals) + - Run queries (generate edge traversals and PageActivity updates) - Run Daydreamer for N passes - Verify edge weights updated - Verify dirty volumes recalculated - Verify prototypes updated + - Verify resident count never exceeds H(t) after any Daydreamer pass + +**Exit Criteria:** Daydreamer demonstrably maintains system health; Williams Bound invariant holds. -**Exit Criteria:** Daydreamer demonstrably maintains system health. +--- + +### P2-F: Community Detection & Graph Coverage Quotas (DELIVERS: topic-diverse hotpath) + +**Why:** Without community detection, a single dense topic can fill the entire page-tier quota, crowding out unrelated memories. Community quotas ensure the hotpath is both hot (high salience) and diverse (topic-representative). + +- [ ] **P2-F1:** Add community detection to `daydreamer/ClusterStability.ts` + - Implement lightweight label propagation on the Metroid neighbor graph + - Run during idle passes when dirty-volume flags indicate meaningful structural change + - Store community labels in `PageActivity.communityId` via `MetadataStore.putPageActivity` + - Rerun when graph topology changes significantly (post-split, post-merge, post-full-recalc) + +- [ ] **P2-F2:** Wire community labels into `SalienceEngine` promotion/eviction + - `selectEvictionTarget` uses `communityId` to find weakest resident in the community bucket + - Promotion checks community quota remaining before admitting + - If community quota is full: candidate must beat weakest resident in that community + - If community is unknown (`communityId` not yet set): place node in temporary pending pool borrowing from page-tier budget + - Empty communities release their slots back to the page-tier budget + +- [ ] **P2-F3:** Add community-aware eviction tests + - `tests/daydreamer/ClusterStability.test.ts` + - Test that a single dense community cannot consume all page-tier hotpath slots + - Test that a new community (previously unknown) receives at least one slot + - Test that an empty community releases its slots correctly + - Test that label propagation converges and produces stable community assignments + +**Exit Criteria:** Community-aware hotpath quotas active; topic diversity enforced; label propagation stable. --- @@ -396,26 +523,30 @@ These items improve quality, performance, and developer experience. Not blockers --- -### P3-C: Cluster Stability +### P3-C: Cluster Stability (full implementation) -**Why:** Detect and fix unstable clusters (split oversized, merge undersized). +**Why:** Detect and fix unstable clusters (split oversized, merge undersized). The community detection added in P2-F is a subset of this module; here we add the full split/merge machinery. -- [ ] **P3-C1:** Implement `daydreamer/ClusterStability.ts` - - Detect high-variance volumes +- [ ] **P3-C1:** Complete `daydreamer/ClusterStability.ts` + - Detect high-variance volumes (unstable) - Trigger split (K-means with K=2) - Detect low-count volumes - Trigger merge with nearest neighbor volume + - Re-run community detection and update PageActivity after split/merge - [ ] **P3-C2:** Add cluster stability test coverage - - `tests/daydreamer/ClusterStability.test.ts` + - `tests/daydreamer/ClusterStability.test.ts` (extend from P2-F) + - Test split produces two balanced volumes + - Test merge produces one combined volume + - Test community labels updated after structural change -**Exit Criteria:** Clusters stay balanced over time. +**Exit Criteria:** Clusters stay balanced over time; community labels stay current. --- ### P3-D: Benchmark Suite -**Why:** Measure performance and track regressions. +**Why:** Measure performance, validate Williams Bound invariants, and track regressions. - [ ] **P3-D1:** Implement real-provider benchmarks - `tests/benchmarks/TransformersJsEmbedding.bench.ts` @@ -429,16 +560,24 @@ These items improve quality, performance, and developer experience. Not blockers - `tests/benchmarks/StorageOverhead.bench.ts` - Disk usage vs page count -- [ ] **P3-D4:** Record baseline measurements - - Add `benchmarks/BASELINES.md` with results +- [ ] **P3-D4:** Implement hotpath scaling benchmarks + - `tests/benchmarks/HotpathScaling.bench.ts` + - Synthetic graphs at 1K, 10K, 100K, 1M nodes+edges + - Measure: resident set size vs H(t), query latency vs corpus size, promotion/eviction throughput + - **Assert:** resident count never exceeds H(t); query cost scales sublinearly with corpus size + - Assert: H(t) values match expected sublinear curve at each scale point -**Exit Criteria:** Benchmark suite exists; baselines recorded. +- [ ] **P3-D5:** Record baseline measurements + - Add `benchmarks/BASELINES.md` with results from all benchmarks + - Include H(t) curve data at 1K/10K/100K/1M + +**Exit Criteria:** Benchmark suite exists; baselines recorded; Williams Bound invariants asserted. --- ### P3-E: CI Hardening -**Why:** Ensure tests run reliably in CI. +**Why:** Ensure tests run reliably in CI; enforce both model-derived and policy-derived numeric guards. - [ ] **P3-E1:** Add GitHub Actions workflow - `.github/workflows/ci.yml` @@ -449,7 +588,13 @@ These items improve quality, performance, and developer experience. Not blockers - Decide CI runner capabilities (software vs hardware rendering) - Update `scripts/run-electron-runtime-tests.mjs` gate logic -**Exit Criteria:** CI runs on every PR; merge blocked if tests fail. +- [ ] **P3-E3:** Add hotpath policy constants guard + - Extend `scripts/guard-model-derived.mjs` or add `scripts/guard-hotpath-policy.mjs` + - Scan for numeric literals assigned to hotpath policy fields outside `core/HotpathPolicy.ts` + - Add as required CI gate alongside `guard:model-derived` + - Add `npm run guard:hotpath-policy` script to `package.json` + +**Exit Criteria:** CI runs on every PR; merge blocked if tests or guards fail; both model-derived and policy-derived constants are enforced. --- @@ -479,31 +624,37 @@ These items improve quality, performance, and developer experience. Not blockers | Phase | Items | Status | Blocking | |-------|-------|--------|----------| -| v0.1 (Minimal Viable) | 17 tasks (P0-A through P0-E) | 🟡 In Progress (P0-A complete) | User cannot use system | -| v0.5 (Hierarchical + Coherent) | 13 tasks (P1-A through P1-F) | ❌ Not started | Blocked by v0.1 | -| v1.0 (Background Consolidation) | 11 tasks (P2-A through P2-E) | ❌ Not started | Blocked by v0.5 | -| Polish & Ship | 14 tasks (P3-A through P3-F) | ❌ Not started | Not blocking v1.0 | +| v0.1 (Minimal Viable) | 23 tasks (P0-A through P0-G + P0-E) | 🟡 In Progress (P0-A complete) | User cannot use system | +| v0.5 (Hierarchical + Coherent) | 14 tasks (P1-A through P1-F) | ❌ Not started | Blocked by v0.1 | +| v1.0 (Background Consolidation) | 14 tasks (P2-A through P2-F) | ❌ Not started | Blocked by v0.5 | +| Polish & Ship | 17 tasks (P3-A through P3-F) | ❌ Not started | Not blocking v1.0 | -**Total:** 55 actionable tasks +**Total:** ~68 actionable tasks --- -## Quick Reference: Next 5 Tasks to Unblock Everything +## Quick Reference: Next 7 Tasks to Unblock Everything If you're reading this and want to know "what do I work on right now?", here's the answer: -1. **P0-B1:** Implement `hippocampus/Chunker.ts` -2. **P0-C1:** Implement `hippocampus/PageBuilder.ts` -3. **P0-C2:** Implement `hippocampus/Ingest.ts` -4. **P0-D1:** Implement `cortex/Query.ts` -5. **P0-E1:** Implement `tests/integration/IngestQuery.test.ts` +1. **P0-F1:** Implement `core/HotpathPolicy.ts` +2. **P0-F3:** Extend `core/types.ts` (PageActivity, HotpathEntry, TierQuotas) +3. **P0-F4:** Extend `storage/IndexedDbMetadataStore.ts` (hotpath stores) +4. **P0-G1/G2:** Implement `core/SalienceEngine.ts` +5. **P0-B1:** Implement `hippocampus/Chunker.ts` +6. **P0-C1/C2:** Implement `hippocampus/PageBuilder.ts` and `hippocampus/Ingest.ts` +7. **P0-D1:** Implement `cortex/Query.ts` + +Items 1–4 (Williams Bound foundation) should be done first — they are small, independently testable, and unlock correct behaviour in everything that follows. --- ## Notes -- **Dependencies:** Items are ordered so that completing tasks in sequence minimizes blocked work. +- **Dependencies:** Items are ordered so that completing tasks in sequence minimises blocked work. P0-F and P0-G (Williams Bound foundation) must precede all hotpath-aware modules. - **Estimates:** Each P0/P1/P2 task is roughly 1-4 hours for an experienced developer familiar with the codebase. - **Testing:** Every implementation task should be accompanied by test coverage (explicitly called out). - **TDD Approach:** Write failing tests first, then implement to green. - **Documentation Sync:** Update PLAN.md module status as tasks complete. +- **Williams Bound Invariant:** The resident count must never exceed H(t). Every test that touches the hotpath should assert this. +- **Policy constants:** Never hardcode hotpath constants outside `core/HotpathPolicy.ts`. The guard in P3-E3 enforces this. From 4de44530e15cf6335f44b6096b715a5319d60ef7 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 02:50:14 +0000 Subject: [PATCH 3/8] docs: clarify hotpath policy guard is planned, not yet implemented Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- DESIGN.md | 2 +- TODO.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index 32387b8..1a82998 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -164,7 +164,7 @@ This keeps subgraph expansion cost sublinear in graph mass. ### Policy Source of Truth -All hotpath constants — `c`, `α`, `β`, `γ`, `q_s`, `q_v`, `q_b`, `q_p` — live in `core/HotpathPolicy.ts` as a frozen default policy object. These are **policy-derived constants** (not model-derived) and are kept strictly separate from `core/ModelDefaults.ts`. A companion guard (or an extension to `guard:model-derived`) prevents these constants from being hardcoded elsewhere. +All hotpath constants — `c`, `α`, `β`, `γ`, `q_s`, `q_v`, `q_b`, `q_p` — live in `core/HotpathPolicy.ts` as a frozen default policy object. These are **policy-derived constants** (not model-derived) and are kept strictly separate from `core/ModelDefaults.ts`. A companion guard (or an extension to `guard:model-derived`) is planned (see TODO.md P3-E3) to prevent these constants from being hardcoded elsewhere; until that guard is in place, discipline is enforced by convention. --- diff --git a/TODO.md b/TODO.md index 7e4ab9c..db3acd3 100644 --- a/TODO.md +++ b/TODO.md @@ -657,4 +657,4 @@ Items 1–4 (Williams Bound foundation) should be done first — they are small, - **TDD Approach:** Write failing tests first, then implement to green. - **Documentation Sync:** Update PLAN.md module status as tasks complete. - **Williams Bound Invariant:** The resident count must never exceed H(t). Every test that touches the hotpath should assert this. -- **Policy constants:** Never hardcode hotpath constants outside `core/HotpathPolicy.ts`. The guard in P3-E3 enforces this. +- **Policy constants:** Never hardcode hotpath constants outside `core/HotpathPolicy.ts`. P3-E3 will add a guard to enforce this automatically; until then, enforce by convention. From df00a1cbd55b762050c6372961f8858e4a851dc2 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 02:51:52 +0000 Subject: [PATCH 4/8] =?UTF-8?q?docs:=20fix=20community=20quota=20overflow?= =?UTF-8?q?=20=E2=80=94=20use=20largest-remainder=20method=20with=20exact?= =?UTF-8?q?=20sum=20guarantee?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- DESIGN.md | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index 1a82998..4cc415a 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -118,11 +118,20 @@ Within each tier, entries are ranked by salience; the highest-salience represent ### Graph-Community Coverage Quotas -Within each tier's budget, slots are allocated proportionally across detected graph communities to prevent a single dense topic from consuming all capacity: - -``` -community_quota(Cᵢ) = max(1, ⌈tier_budget · nᵢ / N⌉) -``` +Within each tier's budget, slots are allocated proportionally across detected graph communities to prevent a single dense topic from consuming all capacity. The allocation uses the **largest-remainder method** to guarantee the quotas sum exactly to `tier_budget`: + +1. Compute the ideal fractional share for each community: + ``` + share(Cᵢ) = tier_budget · nᵢ / N + ``` +2. Floor each share to get a base allocation: + ``` + base(Cᵢ) = ⌊share(Cᵢ)⌋ + ``` +3. Distribute the remaining `tier_budget − Σ base(Cᵢ)` slots one-by-one to the communities with the largest fractional remainders (`share(Cᵢ) − base(Cᵢ)`), breaking ties by community size (larger community wins). +4. Communities that receive a base of 0 and are not selected in step 3 are **excluded** from this tier (no slot). This is intentional: sparse communities are not promoted until they grow. + +The resulting quotas sum to exactly `tier_budget` regardless of the number or sizes of communities, even when there are more communities than `tier_budget`. where `nᵢ` is the number of pages in community Cᵢ and N is the total page count. Community detection runs via lightweight label propagation on the Metroid neighbor graph during Daydreamer idle passes. From 06702d548c6f251387db7ce9df1b13f260bddf58 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 02:53:55 +0000 Subject: [PATCH 5/8] =?UTF-8?q?docs:=20fix=20dynamic=20subgraph=20bound=20?= =?UTF-8?q?formulas=20=E2=80=94=20add=20t=5Feff=20bootstrap=20floor=20and?= =?UTF-8?q?=20domain=20documentation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- DESIGN.md | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index 4cc415a..0a29d81 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -164,12 +164,15 @@ When exceeded, `HierarchyBuilder` or `ClusterStability` triggers a split. The fixed `<30 node` subgraph target is replaced by dynamic formulas that shrink gracefully as the graph grows: ``` -maxSubgraphSize = min(30, ⌊√(t · log₂(1+t)) / log₂(t)⌋) -maxHops = ⌈log₂(log₂(1 + t))⌉ -perHopBranching = ⌊maxSubgraphSize ^ (1 / maxHops)⌋ +t_eff = max(t, 2) -- bootstrap floor (see below) +maxSubgraphSize = min(30, ⌊√(t_eff · log₂(1+t_eff)) / log₂(t_eff)⌋) +maxHops = max(1, ⌈log₂(log₂(1 + t_eff))⌉) +perHopBranching = max(1, ⌊maxSubgraphSize ^ (1 / maxHops)⌋) ``` -This keeps subgraph expansion cost sublinear in graph mass. +**Domain and bootstrap floor.** The raw formulas are undefined when t ≤ 1 (`log₂(1) = 0` → division by zero; `log₂(t) < 0` for t < 1). The effective-mass floor `t_eff = max(t, 2)` eliminates these edge cases. At cold-start (t < 2) the formulas evaluate conservatively to `maxSubgraphSize = 1, maxHops = 1, perHopBranching = 1`, which is safe and correct — a single-node subgraph is the only valid result when fewer than two nodes exist. As the corpus grows past the floor the clamp becomes inactive (`t_eff = t` for all t ≥ 2), so large-corpus dynamics are completely unaffected. The explicit `max(1, …)` guards on `maxHops` and `perHopBranching` provide a secondary safety net against rounding to zero on very small but valid inputs. + +This keeps subgraph expansion cost sublinear in graph mass at scale while remaining well-behaved during cold-start and for tiny corpora. ### Policy Source of Truth From 2ce4dc6297fa054135248c5b5b49907d3005afe2 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 02:56:16 +0000 Subject: [PATCH 6/8] =?UTF-8?q?docs:=20clarify=20hotpath=5Findex=20dual=20?= =?UTF-8?q?role=20=E2=80=94=20RAM=20live=20index=20+=20IndexedDB=20persist?= =?UTF-8?q?ence=20checkpoint?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- DESIGN.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/DESIGN.md b/DESIGN.md index 0a29d81..cda8dab 100644 --- a/DESIGN.md +++ b/DESIGN.md @@ -287,7 +287,12 @@ interface PageActivity { ``` #### HotpathEntry -A record in the resident in-memory index. Tracks which entity is HOT and at what salience level. +The shared record type for HOT membership. Used in two complementary roles: + +1. **Live RAM index** — the active resident set (size ≤ H(t)) that every query scans first. +2. **IndexedDB persistence** — the `hotpath_index` store holds a periodic snapshot of the live index so that HOT membership and salience values survive a page reload or machine reboot. On startup, `HotpathEntry` rows are loaded from IndexedDB to reconstruct the RAM index without requiring a full corpus replay. + +The Daydreamer worker owns the write path to `hotpath_index`; it checkpoints the live index whenever it runs its maintenance cycle (LTP/LTD pass), making the persisted snapshot no more than one cycle stale. ```typescript interface HotpathEntry { @@ -320,7 +325,7 @@ Structured entity storage with automatic reverse indexes. - `metroid_neighbors` (sparse NN graph) - `flags` (dirty-volume recalc markers) - `page_to_book`, `book_to_volume`, `volume_to_shelf` (reverse indexes) -- `hotpath_index` (resident hotpath entries, keyed by `entityId`) +- `hotpath_index` (periodic HOT-membership checkpoint, keyed by `entityId`; loaded on startup to reconstruct the RAM resident index; written by Daydreamer each maintenance cycle) - `page_activity` (per-page activity metadata for salience computation) ## Retrieval Design @@ -483,7 +488,7 @@ All operations must complete on WASM fallback, albeit slower. The resident hotpa **medoid** (mathematical term): The underlying clustering statistic. Reserved for algorithmic comments and internal statistical descriptions only. -**Hotpath**: The in-memory resident index of H(t) entries spanning all four hierarchy tiers. The hotpath is the first lookup target for every query; misses spill to WARM/COLD storage. +**Hotpath**: The in-memory resident index of H(t) entries spanning all four hierarchy tiers. The hotpath is the first lookup target for every query; misses spill to WARM/COLD storage. HOT membership and salience are checkpointed to the `hotpath_index` IndexedDB store by Daydreamer each maintenance cycle, allowing the RAM index to be restored after a page reload or machine reboot without full corpus replay. **Williams Bound**: The theoretical result S = O(√(t log t)) from Williams 2025, applied here as a universal sublinear growth law for all space-time tradeoff subsystems in CORTEX. From 2905eb9762e5e718609de62b1a64f6dbbc673747 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 02:58:36 +0000 Subject: [PATCH 7/8] =?UTF-8?q?docs:=20fix=20HotpathPolicy=20test=20bullet?= =?UTF-8?q?s=20=E2=80=94=20counts=20not=20ratios,=20finite=20range,=20NaN/?= =?UTF-8?q?edge=20guards?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- TODO.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/TODO.md b/TODO.md index db3acd3..9eab6da 100644 --- a/TODO.md +++ b/TODO.md @@ -51,10 +51,13 @@ These items **must** be completed to have a usable system. Without them, users c - [ ] **P0-F2:** Add HotpathPolicy test coverage (`tests/HotpathPolicy.test.ts`) - H(t) grows sublinearly: verify `H(10_000) / 10_000 < H(1_000) / 1_000` - - H(t) is monotonically non-decreasing: verify H(t+1) ≥ H(t) for all t - - Tier quotas sum exactly to capacity: `q_s + q_v + q_b + q_p === 1.0` - - Community quotas sum to tier budget and each slot ≥ 1 + - H(t) is monotonically non-decreasing over a representative range: verify `H(t+1) >= H(t)` for each `t` in `[0, 1, 2, 10, 100, 1_000, 10_000, 100_000]` + - H(t) is a finite integer ≥ 1 for edge inputs: `t = 0`, `t = 1`, `t = Number.MAX_SAFE_INTEGER`; result must never be `NaN`, `Infinity`, or `< 1` + - Derived tier-quota *counts* sum exactly to capacity: `deriveTierQuotas(cap).shelf + .volume + .book + .page === cap` for `cap` in `[1, 10, 100, 1_000]` + - Community quota counts sum exactly to `tier_budget`: `sum(deriveCommunityQuotas(budget, sizes)) === budget` for representative `(budget, sizes)` inputs including edge cases (`budget = 0`, empty `sizes` array, `budget < sizes.length`) + - Community quotas never produce `NaN`, `Infinity`, or negative values for any valid input, including `sizes` with a single community or all equal sizes - Salience is deterministic for same inputs + - Salience clamps output to a finite number: never `NaN` or `Infinity` for extreme weight or hit-count values - [ ] **P0-F3:** Extend `core/types.ts` - Add `PageActivity` interface: `{ pageId: Hash; queryHitCount: number; lastQueryAt: string; communityId?: string }` From 5d1abcb2e515b26154ff0aaadfc19e345552472b Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Fri, 13 Mar 2026 03:01:06 +0000 Subject: [PATCH 8/8] =?UTF-8?q?docs:=20clarify=20COLD=20zone=20=E2=80=94?= =?UTF-8?q?=20metadata=20in=20IndexedDB=20+=20semantically=20isolated,=20r?= =?UTF-8?q?andom-walk-only=20discoverable?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: devlux76 <86517969+devlux76@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 8179a7a..1a29941 100644 --- a/README.md +++ b/README.md @@ -56,7 +56,7 @@ This is the "dreaming" phase that prevents catastrophic forgetting and forces ab - **Biological Scarcity** — Only a fixed number of active prototypes live in memory. Everything else is gracefully demoted to disk. - **Sublinear Growth (Williams Bound)** — The resident hotpath index is bounded to H(t) = ⌈c·√(t·log₂(1+t))⌉ where t = total graph mass (pages + edges). Memory scales sublinearly as the graph grows, trading time for space at a mathematically principled rate. See [`DESIGN.md`](DESIGN.md) for the full theorem mapping. -- **Three-Zone Memory** — HOT (resident in-memory index, capacity H(t)), WARM (indexed in IndexedDB), COLD (raw bytes in OPFS only). All data is retained locally; zones control lookup cost, not data lifetime. +- **Three-Zone Memory** — HOT (resident in-memory index, capacity H(t)), WARM (indexed in IndexedDB, reachable via nearest-neighbour search), COLD (metadata in IndexedDB + raw vectors in OPFS, but semantically isolated from the search path — no strong nearest neighbours in vector space at insertion time; only discoverable by a deliberate random walk). All data is retained locally forever; zones control lookup cost and discoverability, not data lifetime. - **Hierarchical & Sparse** — Progressive dimensionality reduction + medoid clustering keeps memory efficient at any scale, with Williams-derived fanout bounds preventing any single tier from monopolising the index. - **Hebbian & Dynamic** — Connections strengthen and weaken naturally. Node salience (σ = α·H_in + β·R + γ·Q) drives promotion into and eviction from the resident hotpath. - **Zero-Copy & Persistent** — OPFS + IndexedDB with cryptographic signing.