Skip to content

K66 context ready ingestion#8

Open
ryanjosebrosas wants to merge 12 commits intomainfrom
k66-context-ready-ingestion
Open

K66 context ready ingestion#8
ryanjosebrosas wants to merge 12 commits intomainfrom
k66-context-ready-ingestion

Conversation

@ryanjosebrosas
Copy link
Copy Markdown
Owner

No description provided.

ryanjosebrosas and others added 11 commits March 23, 2026 09:02
- add explicit lineage/grouping/provenance fields to canonical storage
- persist retrieval-oriented defaults and richer vector payload identity
- prove packet assembly can consume persisted grouping hints
- preserve custom packet grouping on partial re-ingest
- keep grouped packets source-linked and graph-safe
- align k66 artifact wording and verification state
- include sibling supporting chunks for packet-grounded matches
- preserve packet-first grounded output during context expansion
- fuse retrieval lanes before final packet ordering
- add provider-neutral reranker seam with no-op fallback
- lock rerank score assertions for packet ordering
- cover deterministic reranker fallback during retrieval
- keep full packet pool until fused shortlist selection
- retain fallback vector citations after chunk reordering
- rank omitted rerank packets below explicit results
- keep fusion lane ordering lane-local
- add regression coverage for both cases
- merge latest retrieval/runtime foundation changes
- preserve PR4 reranking and packet-context fixes
- verify merged branch with typecheck, lint, and tests
feat(bead-6il): contextual retrieval and packet reranking
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Free

Run ID: 087414ec-3190-4e3a-b98b-9c9c5a825c99

📥 Commits

Reviewing files that changed from the base of the PR and between 56fe6c0 and 14437d7.

⛔ Files ignored due to path filters (1)
  • .beads/verify.log is excluded by !**/*.log
📒 Files selected for processing (15)
  • .beads/artifacts/second-brain-engine-k66/plan.md
  • .beads/artifacts/second-brain-engine-k66/prd.json
  • .beads/artifacts/second-brain-engine-k66/prd.md
  • .beads/artifacts/second-brain-engine-k66/progress.txt
  • .beads/artifacts/second-brain-engine-k66/research.md
  • .beads/issues.jsonl
  • src/index.ts
  • src/ingestion/service.ts
  • src/retrieval/service.ts
  • src/subsystems/reranker/port.ts
  • src/subsystems/supabase/repository.ts
  • src/subsystems/supabase/schema.ts
  • test/ingestion.relational.test.mjs
  • test/retrieval.hybrid.test.mjs
  • test/retrieval.packet.test.mjs

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added reranker subsystem support for improved retrieval ranking
    • Enhanced packet grouping with persisted metadata for better organization
    • Improved citations with item identity and provenance tracking
  • Improvements

    • Enriched vector metadata with additional context fields
    • Extended source and item attributes for hierarchy and grouping support
    • Added fused and reranked scoring for context packets

Walkthrough

The changes implement the "second-brain-engine-k66" feature, enriching ingestion persistence with retrieval-oriented metadata. Schema extensions add optional hierarchy/grouping fields (rootSourceId, parentSourceId, sourceGroupKey) for sources and tracking fields (ordinal, parentItemId, packetKey, sectionKey, provenanceLocation) for items. Ingestion and retrieval pipelines are updated to persist and consume these fields, with new vector metadata enrichment (occurredAt, packetKey). A new reranker subsystem port is introduced, integrated into the retrieval pipeline with packet-level reranking and fusion scoring. Tests verify persisted structure, packet assembly behavior, and retrieval grounding.

Changes

Cohort / File(s) Summary
Planning & Documentation
.beads/artifacts/second-brain-engine-k66/plan.md, .beads/artifacts/second-brain-engine-k66/prd.json, .beads/artifacts/second-brain-engine-k66/prd.md, .beads/artifacts/second-brain-engine-k66/progress.txt, .beads/artifacts/second-brain-engine-k66/research.md, .beads/issues.jsonl
Added comprehensive bead documentation including implementation plan, PRD (JSON and Markdown), research guidance, and progress log. Closed issue k66 with "Shipped" status after all tasks passed verification.
Schema & Persistence Layer
src/subsystems/supabase/schema.ts, src/subsystems/supabase/repository.ts
Extended Supabase schema to add optional retrieval-oriented fields: sources gain rootSourceId, parentSourceId, sourceGroupKey; items gain ordinal, parentItemId, packetKey, sectionKey, provenanceLocation. Updated repository upsert to compute and persist these fields with sensible defaults based on existing records and provided values.
Ingestion Model
src/ingestion/service.ts
Added optional properties to IngestionSource (rootSourceId, parentSourceId, sourceGroupKey) and IngestionItem (ordinal, parentItemId, packetKey, sectionKey, provenanceLocation) to carry retrieval hints through ingestion pipeline.
Reranker Subsystem
src/subsystems/reranker/port.ts
Introduced new RerankerPort abstraction with async rerank method accepting queryText, packets, and limit; includes createNoopReranker() for deterministic fallback scoring and slicing.
Retrieval Core Logic
src/index.ts, src/retrieval/service.ts
Substantially refactored retrieval pipeline: vector metadata now includes occurredAt and packetKey; added packet-level reranking with fallback to noop reranker; introduced packet-by-key aggregation replacing packet-by-source; added vector-item hydration matching and candidate merging; extended score tracking with fused and reranked fields; updated RetrievalCitation to include optional itemId. Multiple new helper functions for packet key generation, reranking, fusion scoring, citation deduplication, and candidate merging.
Ingestion Tests
test/ingestion.relational.test.mjs
Added schema validation assertions for new retrieval fields in source and item records; updated vector metadata expectations to include occurredAt and packetKey; added tests verifying persisted defaults for source hierarchy and item ordinals; added test confirming packetKey stability across re-ingestion.
Retrieval Tests
test/retrieval.hybrid.test.mjs, test/retrieval.packet.test.mjs
Substantially rewrote retrieval test expectations to reflect reranking integration, packet-key-based grouping, and grounding behavior; added comprehensive packet-level tests verifying vector-match grounding, packet expansion to sibling chunks, relational score propagation, and vector-to-relational candidate merging. Updated mock returns to use new packet/item identity fields.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 The persisted whispers now hold their ground,
With packets grouped by keys so sound,
Retrieval hints in every store—
No more shall context need recompute's chore!
From schema to cite, the memories align,
Second-brain enriched: a blessing divine! 🧠✨


Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@ryanjosebrosas
Copy link
Copy Markdown
Owner Author

@copilot review

Copilot AI review requested due to automatic review settings March 25, 2026 20:43
@ryanjosebrosas ryanjosebrosas review requested due to automatic review settings March 25, 2026 20:43
Copy link
Copy Markdown

Copilot AI commented Mar 25, 2026

@copilot review

Reviewed and addressed three issues in 3a6255f:

Merged double-fetch (hydrateVectorMatchedItems + hydrateVectorItemsByDocumentId) — both functions were independently calling getPersistedItems for the same set of sourceIds on every retrieval request. Combined into a single hydrateVectorData pass that builds both outputs (matchedItems and itemsByDocumentId) in one loop, halving the store round-trips.

Removed dead ?? candidate.entityId fallbackGraphContextCandidate.sourceId is string (required) in the port contract, making the entityId fallback unreachable. Removed in both buildContextPackets and collectExpansionSourceIds.

Simplified Math.max(input.limit * 3, input.limit)input.limit * 3limit is always > 0 at this call site (guarded by the limit === 0 ? [] : ... branches above), so Math.max was unreachable.

Everything else looks solid — the three-lane hybrid pipeline (relational + graph + vector), RRF fusion pre-reranking, pluggable RerankerPort with graceful fallback, workspace access guard before any I/O, and the ContextPacket abstraction are all well-structured. All 56 tests pass, typecheck and lint clean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants