ref(kb): track semantic-layer coordinator and handle refactor
Background
Current KB behavior is spread across API routes, management helpers, pipelines, retrieval helpers, vector storage helpers, file services, and version-management modules. The existing low-level store contracts are useful primitives, but they are not the right semantic boundary for KB lifecycle operations.
Operations such as ingest, search, delete, rename, rollback, uploaded-file reconciliation, file status, version promotion, and cascade cleanup all need one KB-level owner. Without that owner, adding a new storage layout or an external vector database backend would keep leaking backend details into API, tool, and pipeline code.
Goal
Introduce a KB semantic layer with two responsibilities:
KBCoordinator: the upper-layer entry point for current API, tool, pipeline, file-service, management, and version-management callers.
KBCollectionHandle: a collection-scoped backend handle that owns backend-specific data-plane operations.
The first implementation should preserve current behavior and public contracts. This is a refactor-first plan, not a new backend rollout.
Why split this into two phases?
This is intentionally split into two large phases to keep the migration reviewable and low-risk.
Phase 1 builds a compatibility layer first. Existing public functions, API responses, tool behavior, and sync/async shapes remain stable while calls are routed through KBCoordinator. This gives us contract tests and a safe migration boundary before moving deeper data-plane ownership.
Phase 2 moves backend-specific semantics into KBCollectionHandle. After the compatibility layer is stable, each handle step moves one data-plane family into the collection-scoped handle. The coordinator then becomes mostly routing and orchestration, and the compatibility layer shrinks into thin legacy adapters.
Phase 1: Compatibility Layer
Purpose: preserve current behavior while routing existing surfaces through KBCoordinator.
Sub-issues:
Expected phase result:
- Existing KB APIs, tools, module-level helpers, and public imports keep their current behavior.
KBCoordinator becomes the compatibility entry point for KB semantics.
- Collection-level backend binding is created for new collections and first-ingest paths.
- Public-but-not-coordinator-owned surfaces are explicitly classified instead of accidentally wrapped or broken.
Phase 2: KBCollectionHandle Replacement
Purpose: move backend-specific data-plane semantics into collection-scoped handles.
Sub-issues:
Expected phase result:
KBCollectionHandle owns collection-scoped backend mechanics.
KBCoordinator owns context resolution, access policy, backend binding, capability-aware fallback, cross-collection orchestration, and multi-owner orchestration.
- Compatibility facades remain available but become thin legacy adapters.
- A future backend can be introduced by adding a handle implementation instead of changing upper-layer call sites again.
Non-Goals
- Do not introduce a production external vector database backend in this tracking issue.
- Do not change public API response schemas.
- Do not remove public module-level helper functions unless a separate deprecation decision is made.
- Do not move parser algorithms, chunking algorithms, crawler behavior, progress infrastructure, prompt formatting, provider implementations, or generic uploaded-file storage ownership into the KB coordinator.
Success Criteria
- Current KB API contract tests still pass.
- Existing module-level public imports still work.
- Existing sync functions remain sync and async functions remain async.
- LanceDB behavior remains equivalent after each migration step.
- New collection config or first ingest records a collection-level backend binding.
- Historical collections without a binding keep resolving to the current LanceDB-compatible behavior.
- Delete, rename, rollback, search, file status, version promotion, and cascade cleanup go through one KB semantic owner.
Rollback Refactor Integration
Add #570 as C12.5 add operation rollback compatibility facade after #506 and before #507.
Long-term rollback ownership should follow the same two-phase plan as the rest of the KB refactor:
- Phase 1 adds a compatibility-layer operation model so current API, tool, pipeline, and background flows can share one rollback outcome contract without changing public response schemas.
- Phase 2 moves the actual collection-local compensation mechanics into
KBCollectionHandle through #508, #509, #510, #512, and #513.
KBCoordinator owns operation policy, child operation aggregation, handle selection, and final outcome reporting.
- File/upload/durable/physical-file compensation remains in the file compatibility boundary from
#497, not in collection handles.
Additional success criteria:
- Failed ingest rollback is represented by structured operation outcomes, not inferred from document counters.
- API, tool, pipeline, and background ingest paths use the same rollback coordinator semantics.
- Rollback incomplete state, including
side_effects_may_remain, is observable internally and must not be hidden by config or metadata cleanup.
ref(kb): track semantic-layer coordinator and handle refactor
Background
Current KB behavior is spread across API routes, management helpers, pipelines, retrieval helpers, vector storage helpers, file services, and version-management modules. The existing low-level store contracts are useful primitives, but they are not the right semantic boundary for KB lifecycle operations.
Operations such as ingest, search, delete, rename, rollback, uploaded-file reconciliation, file status, version promotion, and cascade cleanup all need one KB-level owner. Without that owner, adding a new storage layout or an external vector database backend would keep leaking backend details into API, tool, and pipeline code.
Goal
Introduce a KB semantic layer with two responsibilities:
KBCoordinator: the upper-layer entry point for current API, tool, pipeline, file-service, management, and version-management callers.KBCollectionHandle: a collection-scoped backend handle that owns backend-specific data-plane operations.The first implementation should preserve current behavior and public contracts. This is a refactor-first plan, not a new backend rollout.
Why split this into two phases?
This is intentionally split into two large phases to keep the migration reviewable and low-risk.
Phase 1 builds a compatibility layer first. Existing public functions, API responses, tool behavior, and sync/async shapes remain stable while calls are routed through
KBCoordinator. This gives us contract tests and a safe migration boundary before moving deeper data-plane ownership.Phase 2 moves backend-specific semantics into
KBCollectionHandle. After the compatibility layer is stable, each handle step moves one data-plane family into the collection-scoped handle. The coordinator then becomes mostly routing and orchestration, and the compatibility layer shrinks into thin legacy adapters.Phase 1: Compatibility Layer
Purpose: preserve current behavior while routing existing surfaces through
KBCoordinator.Sub-issues:
#495Coordinator minimum skeleton#496Storage shim compatibility facade#497File and physical compatibility facade#498Core management compatibility facade#499Maintenance compatibility facade#500Parse display compatibility facade#501Retrieval helper compatibility facade#502Vector storage compatibility facade#503Version compatibility facade#504Pipeline and legacy step compatibility facade#505Tool compatibility facade#506API compatibility facade#570Operation rollback compatibility facade#507Public surface boundary auditExpected phase result:
KBCoordinatorbecomes the compatibility entry point for KB semantics.Phase 2: KBCollectionHandle Replacement
Purpose: move backend-specific data-plane semantics into collection-scoped handles.
Sub-issues:
#508Handle document lifecycle#509Handle parse and chunk lifecycle#510Handle embedding lifecycle#511Handle search lifecycle#512Handle collection lifecycle#513Handle status and version lifecycle#514Coordinator becomes router and orchestrator#515Compatibility layer shrinkExpected phase result:
KBCollectionHandleowns collection-scoped backend mechanics.KBCoordinatorowns context resolution, access policy, backend binding, capability-aware fallback, cross-collection orchestration, and multi-owner orchestration.Non-Goals
Success Criteria
Rollback Refactor Integration
Add
#570asC12.5 add operation rollback compatibility facadeafter#506and before#507.Long-term rollback ownership should follow the same two-phase plan as the rest of the KB refactor:
KBCollectionHandlethrough#508,#509,#510,#512, and#513.KBCoordinatorowns operation policy, child operation aggregation, handle selection, and final outcome reporting.#497, not in collection handles.Additional success criteria:
side_effects_may_remain, is observable internally and must not be hidden by config or metadata cleanup.