feat(db): schema v3-v4 and data access layer for knowledge pipeline#32
Merged
gordonkjlee merged 10 commits intomainfrom Apr 7, 2026
Merged
feat(db): schema v3-v4 and data access layer for knowledge pipeline#32gordonkjlee merged 10 commits intomainfrom
gordonkjlee merged 10 commits intomainfrom
Conversation
Add schema v3 (session_facts, session_fact_sources, domains, consolidation_lock) and v4 (facts with FTS5, entities, fact_entities, entity_edges, sources, consolidations) migrations. Implement synchronous data access modules: - session-facts: insert with SHA-256 dedup, claim for consolidation, provenance linking via junction table - facts: CRUD, FTS5 keyword search, supersession chains - entities: find-or-create with canonical name normalisation, fact-entity linking, weighted graph edges with strength capping - domains: get/create/ensure (idempotent) - consolidation-lock: advisory lock with 5-minute stale detection 48 new tests covering insertion, dedup, FTS5 triggers, supersession chains, entity graph edges, provenance linking, lock acquisition.
Replace linear edge strength increment (0.1 per co-occurrence) with logarithmic potentiation curve: strength = 1 - 1/(1 + count × K). K=0.5 models LTP saturation — early co-occurrences cause large jumps, later ones diminish. EDGE_POTENTIATION_K is a named constant for Phase 3 parametric feedback adjustment. Reduce consolidation lock stale threshold from 5 minutes to 2 minutes. Heuristic consolidation takes milliseconds; even Tier 1 sampling is well under 60 seconds. Faster recovery from crashed processes.
- Add FTS5 DELETE trigger on facts table with comment explaining why UPDATE trigger is omitted (immutable fact content per ADR-4) - Add missing partial index idx_session_facts_unclaimed on session_facts(created_at) WHERE consolidation_id IS NULL - Hardcode user_version in applyV4 (was using CURRENT_VERSION variable) - Add NOT NULL constraint on entities.canonical_name (code always writes it) - Throw in supersedeFact when oldId does not exist (was silent no-op) - Add test for findEntityByCanonical contract (no normalisation) - Add test for supersedeFact with nonexistent oldId
9d4391c to
b781a6d
Compare
Replace internal document references with self-explanatory descriptions. Design docs are gitignored — source code must be understandable without access to them.
b781a6d to
bf0f0d4
Compare
- Fix Entity.canonical_name type: string | null → string (matches NOT NULL schema constraint) - Remove DEFAULT 1.0 from entity_edges.strength (force explicit values via upsertEntityEdge logarithmic formula) - Document findEntity non-determinism without type filter - Document keywordSearch FTS5 syntax throw behaviour
Schema: - Add UNIQUE(canonical_name, type) on entities — prevents duplicate entities and makes findOrCreateEntity safe across processes - Remove DEFAULT from entity_edges.strength — forces explicit values Edge potentiation: - Replace inverse-formula approach with exponential saturation: new = old + (1 - old) * alpha. Monotonically increasing by construction, no floating-point precision loss, no schema change. Rename EDGE_POTENTIATION_K → EDGE_POTENTIATION_ALPHA (0.3). Terminology: - "saturating potentiation" not "logarithmic" (the formula is hyperbolic/exponential, not logarithmic) - "inspired by LTP saturation" not "models LTP" - Remove "(pattern separation)" from content_hash comment — hash dedup collapses identical inputs, the opposite of pattern separation Defensive checks: - findOrCreateEntity wrapped in transaction - insertFact checks result.changes - releaseLock returns boolean - claimForConsolidation JSDoc documents lock precondition Tests: - Tighten FTS5 rank assertion (typeof number, not toBeDefined) - Tighten search result count (toHaveLength, not >= 1) - Edge tests verify monotonic increase across 50 iterations
Schema: - consolidations.session_id nullable (consolidation spans sessions) - Add Phase 2 comment on sources table (no data access yet) - Add comment explaining intentional FK omission on v3/v4 tables Lock: - Reset started_at on re-acquisition to prevent stale detection while holder is still active Facts: - Export sanitiseFtsQuery() helper — wraps terms in double quotes to force literal matching, strips stray quote characters - insertFact: distinguish undefined (default to now) from null (explicitly unknown valid_from) for bitemporal correctness - JSDoc documenting valid_from default behaviour Entities: - findEntityByCanonical: document non-determinism without type - createEntity: document UNIQUE constraint throw - upsertEntityEdge: document entity existence responsibility - Fix precision claim: "no practical precision concern" not "no loss"
- sanitiseFtsQuery: empty input, stray quotes, FTS5 operators, single term - insertFact with valid_from: null stores null (unknown validity start) - insertFact without valid_from defaults to now
- Fix Consolidation.session_id type: string → string | null (matches nullable schema — consolidation spans multiple sessions) - Add optimistic WHERE clause to stale lock takeover (verify holder + timestamp unchanged between SELECT and UPDATE) - Document supersedeFact: valid_from always set to now (intentional) - Document sanitiseFtsQuery: per-term matching, not phrase
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Infrastructure Change
Summary
Stakeholder
Links
Ticket: N/A
Requirements: N/A
Description
Schema v3 creates the Information layer and supporting tables:
session_facts— captured/extracted facts awaiting consolidation, withcontent_hash(SHA-256, UNIQUE per session for dedup),source_origin('explicit'/'inferred' for hybrid capture),consolidation_id(claimed by which run). Partial index onconsolidation_id IS NULLfor fast unclaimed-fact queries.session_fact_sources— provenance junction table linking facts to multiple source events with relevance and extraction_type.domains— domain registry (data, not code). Seeded from config or created at runtime.consolidation_lock— single-row advisory lock (CHECK(id = 1)) preventing concurrent consolidation. 2-minute stale threshold for crash recovery.Schema v4 creates the Knowledge layer:
facts+facts_fts(FTS5 virtual table) + INSERT and DELETE sync triggers. Graduated, entity-linked, deduplicated facts.is_latestflag for fast current-state queries. FTS5 UPDATE trigger intentionally omitted: fact content is immutable (never modified, only superseded).entities— graph nodes withcanonical_name NOT NULLandUNIQUE(canonical_name, type)constraint for dedup.access_count/last_accessed_atfor activation tracking (inspired by ACT-R frequency + recency signals).fact_entities— junction table linking facts to entities with relationship type.entity_edges— entity-to-entity relationships with saturating potentiation forstrength(spreading activation, Collins & Loftus, 1975). Strength followsnew = old + (1 - old) × α(α=0.3), inspired by LTP saturation — early co-occurrences cause large jumps, later ones diminish. Monotonically increasing by construction, no floating-point precision loss.EDGE_POTENTIATION_ALPHAis a named constant for future parametric feedback.sources— provenance records.consolidations— run records with stats (facts_in, graduated, rejected, entities_created, supersessions).Data access modules (all synchronous,
better-sqlite3):session-facts.ts—insertSessionFact(INSERT OR IGNORE for hash dedup),getSessionFacts,getUnconsolidatedFacts,claimForConsolidation(atomic claim; caller must hold consolidation lock),linkFactSource,getFactSourcesfacts.ts—insertFact(validates INSERT succeeded),getFact,getFactsByDomain,getFactsByEntity,supersedeFact(transaction: mark old + insert new; throws if old fact not found),keywordSearch(FTS5 BM25; throws on malformed FTS5 syntax — callers should sanitise or catch),incrementFactAccessentities.ts—findEntity(canonical name match; non-deterministic without type filter),findEntityByCanonical(exact match, no normalisation),findOrCreateEntity(transaction-wrapped, safe with UNIQUE constraint),createEntity,linkFactEntity,upsertEntityEdge(saturating potentiation withEDGE_POTENTIATION_ALPHA),getEntityEdges,updateEntityAccessdomains.ts—getDomains,createDomain,ensureDomain(idempotent)consolidation-lock.ts—acquireLock(with 2-minute stale detection and takeover),releaseLock(returns boolean),getLockStateTesting
48 new tests across 3 test files + 2 updated assertions in
sessions.test.ts:is_latestboolean cast, domain filtering, subdomain filtering, entity-linked retrieval, supersession chains (A→B→C with only Cis_latest), FTS5 keyword search with rank type assertion, access count increment, throws on superseding nonexistent factfindEntityByCanonicalexact-match contract, type filter, find-or-create, metadata round-trip, fact-entity linking + idempotency, saturating edge potentiation (monotonic increase verified across 50 iterations), edge retrieval, access tracking, domain CRUD + idempotency, lock acquire/release/stale takeover (2-min threshold)All tests use
:memory:databases — no file system side effects.Impact Assessment
Breaking changes: None. New tables only — existing
sessionsandsession_eventstables unchanged.Components affected: Schema version moves from 2 → 4. Migrations are additive (CREATE TABLE IF NOT EXISTS). Existing databases auto-migrate on server start.
Rollback plan: Drop tables via
DROP TABLE IF EXISTS session_facts, session_fact_sources, domains, consolidation_lock, facts, facts_fts, entities, fact_entities, entity_edges, sources, consolidationsand resetPRAGMA user_version = 2.Complexity
Checklist
npm run buildsucceedsnpm testpasses