Skip to content

ref(kb): track semantic-layer coordinator and handle refactor #494

@rogercloud

Description

@rogercloud

ref(kb): track semantic-layer coordinator and handle refactor

Background

Current KB behavior is spread across API routes, management helpers, pipelines, retrieval helpers, vector storage helpers, file services, and version-management modules. The existing low-level store contracts are useful primitives, but they are not the right semantic boundary for KB lifecycle operations.

Operations such as ingest, search, delete, rename, rollback, uploaded-file reconciliation, file status, version promotion, and cascade cleanup all need one KB-level owner. Without that owner, adding a new storage layout or an external vector database backend would keep leaking backend details into API, tool, and pipeline code.

Goal

Introduce a KB semantic layer with two responsibilities:

  • KBCoordinator: the upper-layer entry point for current API, tool, pipeline, file-service, management, and version-management callers.
  • KBCollectionHandle: a collection-scoped backend handle that owns backend-specific data-plane operations.

The first implementation should preserve current behavior and public contracts. This is a refactor-first plan, not a new backend rollout.

Why split this into two phases?

This is intentionally split into two large phases to keep the migration reviewable and low-risk.

Phase 1 builds a compatibility layer first. Existing public functions, API responses, tool behavior, and sync/async shapes remain stable while calls are routed through KBCoordinator. This gives us contract tests and a safe migration boundary before moving deeper data-plane ownership.

Phase 2 moves backend-specific semantics into KBCollectionHandle. After the compatibility layer is stable, each handle step moves one data-plane family into the collection-scoped handle. The coordinator then becomes mostly routing and orchestration, and the compatibility layer shrinks into thin legacy adapters.

Phase 1: Compatibility Layer

Purpose: preserve current behavior while routing existing surfaces through KBCoordinator.

Sub-issues:

  • #495 Coordinator minimum skeleton
  • #496 Storage shim compatibility facade
  • #497 File and physical compatibility facade
  • #498 Core management compatibility facade
  • #499 Maintenance compatibility facade
  • #500 Parse display compatibility facade
  • #501 Retrieval helper compatibility facade
  • #502 Vector storage compatibility facade
  • #503 Version compatibility facade
  • #504 Pipeline and legacy step compatibility facade
  • #505 Tool compatibility facade
  • #506 API compatibility facade
  • #570 Operation rollback compatibility facade
  • #507 Public surface boundary audit

Expected phase result:

  • Existing KB APIs, tools, module-level helpers, and public imports keep their current behavior.
  • KBCoordinator becomes the compatibility entry point for KB semantics.
  • Collection-level backend binding is created for new collections and first-ingest paths.
  • Public-but-not-coordinator-owned surfaces are explicitly classified instead of accidentally wrapped or broken.

Phase 2: KBCollectionHandle Replacement

Purpose: move backend-specific data-plane semantics into collection-scoped handles.

Sub-issues:

  • #508 Handle document lifecycle
  • #509 Handle parse and chunk lifecycle
  • #510 Handle embedding lifecycle
  • #511 Handle search lifecycle
  • #512 Handle collection lifecycle
  • #513 Handle status and version lifecycle
  • #514 Coordinator becomes router and orchestrator
  • #515 Compatibility layer shrink

Expected phase result:

  • KBCollectionHandle owns collection-scoped backend mechanics.
  • KBCoordinator owns context resolution, access policy, backend binding, capability-aware fallback, cross-collection orchestration, and multi-owner orchestration.
  • Compatibility facades remain available but become thin legacy adapters.
  • A future backend can be introduced by adding a handle implementation instead of changing upper-layer call sites again.

Non-Goals

  • Do not introduce a production external vector database backend in this tracking issue.
  • Do not change public API response schemas.
  • Do not remove public module-level helper functions unless a separate deprecation decision is made.
  • Do not move parser algorithms, chunking algorithms, crawler behavior, progress infrastructure, prompt formatting, provider implementations, or generic uploaded-file storage ownership into the KB coordinator.

Success Criteria

  • Current KB API contract tests still pass.
  • Existing module-level public imports still work.
  • Existing sync functions remain sync and async functions remain async.
  • LanceDB behavior remains equivalent after each migration step.
  • New collection config or first ingest records a collection-level backend binding.
  • Historical collections without a binding keep resolving to the current LanceDB-compatible behavior.
  • Delete, rename, rollback, search, file status, version promotion, and cascade cleanup go through one KB semantic owner.

Rollback Refactor Integration

Add #570 as C12.5 add operation rollback compatibility facade after #506 and before #507.

Long-term rollback ownership should follow the same two-phase plan as the rest of the KB refactor:

  • Phase 1 adds a compatibility-layer operation model so current API, tool, pipeline, and background flows can share one rollback outcome contract without changing public response schemas.
  • Phase 2 moves the actual collection-local compensation mechanics into KBCollectionHandle through #508, #509, #510, #512, and #513.
  • KBCoordinator owns operation policy, child operation aggregation, handle selection, and final outcome reporting.
  • File/upload/durable/physical-file compensation remains in the file compatibility boundary from #497, not in collection handles.

Additional success criteria:

  • Failed ingest rollback is represented by structured operation outcomes, not inferred from document counters.
  • API, tool, pipeline, and background ingest paths use the same rollback coordinator semantics.
  • Rollback incomplete state, including side_effects_may_remain, is observable internally and must not be hidden by config or metadata cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions