Skip to content

libSQL: add brute-force vector search fallback for missing index#88

Merged
leynos merged 40 commits into
mainfrom
libsql-brute-force-vector-fallback-63cr2c
Apr 16, 2026
Merged

libSQL: add brute-force vector search fallback for missing index#88
leynos merged 40 commits into
mainfrom
libsql-brute-force-vector-fallback-63cr2c

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Mar 29, 2026

Summary

Implements a robust fallback path for libSQL semantic search when a fixed-dimension vector index is unavailable after the V9 migration. If the indexed vectorTopK query cannot run, we fall back to a Rust-based brute-force cosine similarity over stored embeddings and feed results through the existing RRF fusion path. This aligns libSQL behavior with the current off-dallback parity while preserving semantic retrieval.

Changes

Core functionality

  • Introduced VectorSearchOutcome enum to differentiate between indexed results and index-unavailable scenarios.
  • Refactored vector ranking to return VectorSearchOutcome instead of a plain vector.
  • Added logic to fall back to brute-force cosine similarity when the vector index is missing or its query fails.
  • Added informative tracing/log messages to indicate when we’re using the index or brute-force path and to report results.

libSQL backend changes

  • src/db/libsql/workspace.rs
    • Implemented vector_ranked_results to attempt vector_top_k querying against the fixed-dimension index.
    • On success, return VectorSearchOutcome::Indexed(results).
    • On index unavailability or query failure, return VectorSearchOutcome::IndexUnavailable and log fallback information.
    • Updated hybrid search integration to fall back to brute-force vector search when the index is unavailable, preserving the existing RRF fusion path.
  • Documentation updates reflect the new behavior:
    • Updated workspace and libSQL docs to describe the fallback behavior when a fixed-dimension index is not present.
    • Clarified that libSQL can perform brute-force cosine similarity in Rust as a fallback for semantic search.

Tests

  • Added test: hybrid_search_uses_brute_force_when_vector_index_is_unavailable
    • Bootstraps a local libSQL workspace, inserts a document and embedding, and runs a hybrid search with a vector embedding.
    • Verifies that when the vector index is unavailable, the fallback path is used and results are produced via brute-force search.

Documentation

  • Docs updated to reflect libSQL’s hybrid search path after V9 migration:
    • docs/database-integrations.md: clarifies that libSQL uses an indexed vector_top_k when available, otherwise falls back to brute-force cosine similarity in Rust.
    • docs/configuration-guide.md: explains that workspace memory search uses pgvector cosine distance in PostgreSQL, and libSQL falls back to brute-force cosine similarity when an index is not available.
    • CLAUDE.md: documents the updated vector dimension and fallback behavior for libSQL.
    • src/workspace/README.md: clarifies the current libSQL hybrid search behavior and fallback path.

Test plan

  • Run unit tests for LibSqlBackend hybrid search to ensure proper fallback behavior when vector index is unavailable.
  • Validate that hybrid search returns correct results via brute-force cosine similarity and preserves RRF fusion semantics.
  • Manual end-to-end testing in a locally provisioned environment with mixed index availability to observe behavior switch.

Why this is needed

  • Parity with the intended behavior after V9 migrations: libSQL can no longer rely solely on a fixed-dimension vector index. Providing a robust, observable brute-force fallback ensures semantic search remains available and predictable, while still leveraging indexed search when possible. This maintains usable parity and improves resilience in edge deployments.

◳ Generated by DevBoxer


ℹ️ Tag @devboxerhub to ask questions and address PR feedback

📎 Task: https://www.devboxer.com/task/cf1e74de-dca1-45eb-95be-c5ba909697a5

📝 Closes #5

Summary by Sourcery

Add a brute-force cosine similarity fallback for libSQL vector search when the fixed-dimension index is unavailable, strengthen remote tool proxy and worker-orchestrator transport tests for schema and routing fidelity, and update documentation and roadmap to reflect the new behavior and completed test plan.

New Features:

  • Support brute-force cosine similarity vector search in libSQL when the vector index is missing or its query fails, while preserving hybrid search fusion semantics.
  • Introduce shared complex tool definition fixtures and additional worker/orchestrator tests to guarantee remote tool schema fidelity, routing correctness, and transport contract parity.

Bug Fixes:

  • Remove test-only webhook server listener APIs and refactor restart tests to avoid TOCTOU issues and port conflicts.
  • Ensure libSQL hybrid search no longer silently drops semantic results when the vector index is absent by falling back to brute-force search instead of returning no vector results.

Enhancements:

  • Differentiate libSQL vector search outcomes via a dedicated enum so callers can distinguish indexed results from index-unavailable scenarios.
  • Refine test infrastructure, including worker API fixtures and hosted remote-tool fidelity harnesses, to improve reliability and coverage without changing runtime behavior.
  • Tighten type derivations (e.g., adding PartialEq to transport and tool output types) to support structural equality assertions in tests.

Documentation:

  • Document libSQL’s post-V9 hybrid search behavior, including brute-force cosine fallback semantics and backend trade-offs versus PostgreSQL.
  • Update internal RFC, roadmap, contents, CLAUDE, workspace README, configuration guide, and feature parity docs to describe the new test matrix, completed roadmap item 1.1.4, and libSQL vector search behavior.

Tests:

  • Add libSQL hybrid search regression test covering brute-force fallback when the vector index is unavailable.
  • Add comprehensive worker/orchestrator tests for remote tool catalogue fidelity, execution routing, transport type round-tripping, URL construction, and finish-reason/status handling.
  • Refactor and expand test helpers and fixtures for remote tool mocks and worker API failure modes to support more robust integration tests.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 29, 2026

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

State that libSQL workspace search now attempts indexed vector_top_k(...) and falls back to Rust brute‑force cosine when the fixed‑dimension index is unavailable; introduce a LibSqlDatabase wrapper using temp‑file‑backed test DBs with cleanup; split libSQL workspace into modular document/chunk/fts/vector modules and add tests; propagate the wrapper across consumers; refactor WebhookServer listener handling and add test helpers and docs.

Changes

Cohort / File(s) Summary
Documentation updates
FEATURE_PARITY.md, docs/configuration-guide.md, docs/database-integrations.md, src/db/CLAUDE.md, src/workspace/README.md, docs/execplans/..., docs/developers-guide.md, docs/users-guide.md
Document libSQL post‑V9 semantics: attempt indexed vector_top_k(...) when a compatible fixed‑dimension index exists; otherwise fall back to Rust brute‑force cosine. Add operator visibility via ironclaw doctor/status and reframe differences as performance/latency trade‑offs.
libSQL workspace split & vector search
src/db/libsql/workspace/mod.rs, src/db/libsql/workspace/vector_search.rs, src/db/libsql/workspace/fts.rs, src/db/libsql/workspace/document_ops.rs, src/db/libsql/workspace/chunk_ops.rs, src/db/libsql/workspace/tests.rs
Replace monolithic workspace with modular implementation: FTS helper, vector search (indexed path + VectorSearchOutcome::IndexUnavailable), brute‑force fallback in Rust, embedding (de)serialisation, document/chunk ops, and unit/integration tests validating fallback and dimension filtering.
LibSQL DB wrapper & lifecycle
src/db/libsql/mod.rs
Add LibSqlDatabase wrapper holding raw DB and optional temp_path; change new_memory() to create UUID‑named temp .db, provide async connect() with retry/busy behaviour, and implement Drop to remove .db and -wal/-shm sidecars when appropriate.
LibSQL handle propagation
src/db/mod.rs, src/channels/wasm/storage.rs, src/tools/wasm/storage.rs, src/secrets/store.rs
Replace Arc<libsql::Database> with Arc<crate::db::libsql::LibSqlDatabase> across consumers; update constructors and connect() sites; remove duplicate PRAGMA busy_timeout calls now handled by wrapper.
Remove old libSQL workspace file
src/db/libsql/workspace.rs
Delete legacy workspace implementation (previous CRUD, hybrid search and helpers) in favour of the new modular files.
Webhook server test infra
src/channels/webhook_server.rs, docs/developers-guide.md
Add resolved_addr: Option<SocketAddr> to WebhookServer; resolve and store OS‑chosen listener address; extract spawn_with_listener; add #[cfg(test)] helpers start_with_listener/restart_with_listener accepting pre‑bound TcpListener; document test helpers and advise pre‑binding to avoid port races.
Hot reload listener matching
src/reload/manager.rs, src/reload/manager/tests/restart_tests.rs
Change restart guard to treat listener unchanged when any resolved addr matches current host either by full SocketAddr equality or by matching IP with resolved port 0; add test ensuring no restart for ephemeral bind on same host.
WASM channel/tool stores
src/channels/wasm/storage.rs, src/tools/wasm/storage.rs
Update stored libSQL handle types to Arc<crate::db::libsql::LibSqlDatabase> and simplify connect() to call wrapper connect() without extra PRAGMA.
Secrets store
src/secrets/store.rs
Change LibSqlSecretsStore db field and constructor to use Arc<crate::db::libsql::LibSqlDatabase> and delegate connection to wrapper connect().
Orchestrator tests & fixtures
src/orchestrator/api/tests/catalogue_fidelity.rs, src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs
Add ToolFixture::CatalogAlphaWithDifferentPayload; update test to register it and assert catalogue version sensitivity to payload changes; add fixture builder arm producing altered JSON‑schema.
Minor tests/docs edits
src/test_support.rs, docs/execplans/..., src/workspace/README.md
Spelling/formatting tweaks, adjust ExecPlan references, reformat README blocks, and tweak test assertions/messages.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HybridSearch as Hybrid Search
    participant VectorIndex as Vector Index Query
    participant BruteForce as Brute‑Force Cosine
    participant FTS as FTS5 Search
    participant RRF as Reciprocal Rank Fusion

    Client->>HybridSearch: hybrid_search(query, embedding)
    par Vector Search Path
        HybridSearch->>VectorIndex: vector_top_k('idx_memory_chunks_embedding', vector(?), k)
        alt Index Available
            VectorIndex-->>HybridSearch: indexed results
        else Index Unavailable
            VectorIndex-->>HybridSearch: query fails (index missing)
            HybridSearch->>BruteForce: load embeddings, compute cosine distances
            BruteForce-->>HybridSearch: brute‑force results
        end
    and Full‑Text Search Path
        HybridSearch->>FTS: fts_search(query)
        FTS-->>HybridSearch: FTS results
    end
    HybridSearch->>RRF: combine vector + FTS results
    RRF-->>Client: ranked merged results
Loading

Possibly related issues

Poem

When indices fell, restore the chase,
Try top_k first, then brute‑force pace;
Fuse ranks from FTS and vector line,
Surface the mode and log the sign —
Ensure search sings across each place.

🚥 Pre-merge checks | ✅ 7 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Testing ⚠️ Warning New functionality across vector_search.rs (300 lines), document_ops.rs (408 lines), chunk_ops.rs (163 lines), and fts.rs lacks comprehensive unit test coverage within respective modules; only 177 lines of centralised tests.rs exists; critical error paths untested. Add module-level unit tests for vector_search.rs, document_ops.rs, chunk_ops.rs, and fts.rs covering success and error paths; create integration tests for hybrid_search pipeline; add tests for embedding serialisation round-trips and UUID parsing edge cases.
Developer Documentation ⚠️ Warning The PR introduces significant undocumented internal APIs including LibSqlDatabase wrapper, changed shared_db() return types, workspace store module architecture, and parameter-object structs across multiple store implementations. Expand docs/developers-guide.md with subsections documenting LibSqlDatabase wrapper usage, workspace store implementation patterns, parameter-object structs for function arity reduction, and concrete examples of type changes propagating through store constructors.
✅ Passed checks (7 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding brute-force vector search fallback for libSQL when the index is missing.
Description check ✅ Passed The description comprehensively covers objectives, changes, testing, and documentation updates with sufficient detail across all relevant sections.
Linked Issues check ✅ Passed The PR implementation fully addresses issue #5 by restoring hybrid search parity through brute-force cosine fallback, improving observability, and updating documentation.
Out of Scope Changes check ✅ Passed All code changes align with the primary objective of implementing brute-force vector search fallback for libSQL; supporting infrastructure refactors (webhook server, database wrapper, test helpers) are scoped to enable the core feature.
Docstring Coverage ✅ Passed Docstring coverage is 93.55% which is sufficient. The required threshold is 80.00%.
User-Facing Documentation ✅ Passed New user-facing functionality (libSQL brute-force vector search fallback) is thoroughly documented in docs/users-guide.md with dedicated section explaining backend differences and fallback behaviour.
Module-Level Documentation ✅ Passed All six new module files in src/db/libsql/workspace/ carry proper module-level docstrings using Rust's //! convention, clearly explaining each module's purpose.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

📋 Issue Planner

Built with CodeRabbit's Coding Plans for faster development and fewer bugs.

View plan used: #5

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch libsql-brute-force-vector-fallback-63cr2c

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added scope: docs Documentation scope: workspace Persistent memory / workspace size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: core 20+ merged PRs labels Mar 29, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Mar 29, 2026

Reviewer's Guide

Implements a robust libSQL hybrid search fallback that uses brute-force cosine similarity when the vector index is unavailable, and simultaneously hardens the hosted remote-tool contract by refactoring worker/orchestrator transport types, adding comprehensive schema-fidelity and routing tests, and adjusting webhook/server helpers and docs to reflect the new behavior and completed roadmap item 1.1.4.

Sequence diagram for libSQL hybrid search with vector index fallback to brute-force

sequenceDiagram
    actor Caller
    participant LibSqlBackend
    participant LibsqlConnection
    participant VectorIndex
    participant RustBruteForceEngine
    participant RRFEngine

    Caller->>LibSqlBackend: hybrid_search(HybridSearchParams)
    LibSqlBackend->>LibSqlBackend: read SearchConfig
    alt config.use_vector and embedding present
        LibSqlBackend->>LibsqlConnection: vector_ranked_results(user_id, agent_id, embedding, pre_limit)
        alt vector index available
            LibsqlConnection->>VectorIndex: vector_top_k(idx_memory_chunks_embedding, embedding, pre_limit)
            VectorIndex-->>LibsqlConnection: rows
            LibsqlConnection-->>LibSqlBackend: VectorSearchOutcome::Indexed(results)
            LibSqlBackend->>LibSqlBackend: vector_results = results
        else index unavailable or query fails
            LibsqlConnection-->>LibSqlBackend: VectorSearchOutcome::IndexUnavailable
            LibSqlBackend->>RustBruteForceEngine: brute_force_vector_search(user_id, agent_id, embedding, pre_limit)
            RustBruteForceEngine-->>LibSqlBackend: Vec_RankedResult_
            LibSqlBackend->>LibSqlBackend: vector_results = brute_force_results
        end
    else no vector search
        LibSqlBackend->>LibSqlBackend: vector_results = empty
    end

    LibSqlBackend->>LibsqlConnection: fts_ranked_results(user_id, agent_id, query, pre_limit)
    LibsqlConnection-->>LibSqlBackend: fts_results

    LibSqlBackend->>RRFEngine: fuse(vector_results, fts_results)
    RRFEngine-->>LibSqlBackend: hybrid_results

    LibSqlBackend-->>Caller: hybrid_results
Loading

Class diagram for libSQL hybrid search fallback and vector search outcome

classDiagram
    class LibSqlBackend {
        +hybrid_search(params: HybridSearchParams) Result~Vec_HybridSearchResult_~
        +brute_force_vector_search(user_id: &str, agent_id: Option_&str_, embedding: &[f32], limit: usize) Result~Vec_RankedResult_~
    }

    class VectorSearchOutcome {
        <<enum>>
        Indexed(results: Vec_RankedResult_)
        IndexUnavailable
    }

    class RankedResult {
        +chunk_id: i64
        +document_id: i64
        +similarity: f32
        +rank: u32
    }

    class Candidate {
        +chunk_id: i64
        +document_id: i64
        +similarity: f32
    }

    class LibsqlConnection {
        +query(sql: &str, params: &[Value]) Result~Rows~
    }

    class WorkspaceError {
        +reason: String
    }

    class HybridSearchParams {
        +user_id: String
        +agent_id: Option_String_
        +query: String
        +embedding: Option_Vec_f32_
        +config: SearchConfigRef
    }

    class SearchConfigRef {
        +use_vector: bool
        +limit: u32
    }

    LibSqlBackend --> LibsqlConnection : uses
    LibSqlBackend --> RankedResult : returns
    LibSqlBackend --> VectorSearchOutcome : handles
    Candidate --> RankedResult : converted_to

    %% free functions represented as utility classes
    class VectorSearchFunctions {
        +vector_ranked_results(conn: &LibsqlConnection, user_id: &str, agent_id: Option_&str_, embedding: &[f32], limit: i64) Result~VectorSearchOutcome~
        +rank_candidates(candidates: Vec_Candidate_, limit: usize) Vec_RankedResult_
    }

    VectorSearchFunctions --> LibsqlConnection : queries
    VectorSearchFunctions --> VectorSearchOutcome : returns
    VectorSearchFunctions --> WorkspaceError : error_type
    VectorSearchFunctions --> Candidate : ranks
    VectorSearchFunctions --> RankedResult : produces

    LibSqlBackend ..> VectorSearchFunctions : calls
Loading

Flow diagram for libSQL semantic search path after V9 migration

flowchart TD
    A[Start hybrid_search
    use_vector true
    embedding present] --> B[Call vector_ranked_results]
    B --> C{vector_top_k available
    and rows stream OK?}

    C -->|Yes| D[VectorSearchOutcome::Indexed
    with ranked results]
    D --> E[Use indexed vector results
    as vector_results]

    C -->|No
    index missing or query error| F[VectorSearchOutcome::IndexUnavailable]
    F --> G[Log brute-force fallback]
    G --> H[Run brute_force_vector_search
    cosine similarity in Rust]
    H --> I[Use brute-force results
    as vector_results]

    E --> J[Fetch FTS results via FTS5]
    I --> J

    J --> K[Apply RRF fusion
    over vector_results and fts_results]
    K --> L[Return fused hybrid results]
Loading

File-Level Changes

Change Details Files
Introduce explicit VectorSearchOutcome handling and brute-force cosine similarity fallback in libSQL hybrid search when the fixed-dimension vector index is unavailable.
  • Added VectorSearchOutcome enum to distinguish indexed results from index-unavailable scenarios.
  • Refactored vector_ranked_results to return VectorSearchOutcome, logging index query failures and row-fetch errors as IndexUnavailable.
  • Updated hybrid_search to switch to brute_force_vector_search when the vector index is missing or fails, preserving RRF fusion semantics and adding an integration test for the fallback path.
src/db/libsql/workspace.rs
Clean up webhook server listener management and simplify tests to use normal bind/restart paths rather than test-only pre-bound listener helpers.
  • Removed test-only start_with_listener and restart_with_listener helpers and the shared spawn_with_listener method.
  • Simplified bind_and_spawn to bind and spawn directly, logging the configured address instead of rewriting it from the listener.
  • Reworked webhook_server tests to spin up full servers on ephemeral ports via a StartedWebhookServer fixture and exercise restart_with_addr and rollback semantics over TCP port conflicts.
src/channels/webhook_server.rs
Refactor worker remote-tool proxy tests into a dedicated module and expand coverage to ensure full ToolDefinition and ToolOutput fidelity and correct routing to the orchestrator endpoint.
  • Moved inline tests from worker_remote_tool_proxy.rs into a new tests module file, preserving the original round-trip test with a fixture-based ProxyTestServer.
  • Added tests that reconstruct ToolDefinition from proxy accessors to guarantee no field loss and that ToolOutput metadata (cost, raw, duration) is preserved.
  • Introduced a route-capturing test server to assert the proxy sends a single request to the expected /worker/{job_id}/tools/execute endpoint with correct tool_name and job_id.
src/tools/builtin/worker_remote_tool_proxy.rs
src/tools/builtin/worker_remote_tool_proxy/tests.rs
Add shared test support and fixtures for complex ToolDefinition payloads and worker/orchestrator transport error paths to lock down schema fidelity and error propagation.
  • Introduced src/test_support.rs with builders for complex nested JSON Schema parameter definitions reused across orchestrator and worker tests.
  • Extended orchestrator remote tool mocks with complex_tool_definition and complex_tool_stub and added catalogue_fidelity tests to assert full ToolDefinition equality and catalog_version determinism.
  • Added worker/api test fixtures for failure-mode mock servers and sample catalog/execution payloads, plus tests covering URL construction, remote tool catalog/execute error mapping, and serde round-trips for transport types and finish reasons.
src/test_support.rs
src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs
src/orchestrator/api/tests/catalogue_fidelity.rs
src/orchestrator/api/tests/transport_parity.rs
src/worker/api/tests/mod.rs
src/worker/api/tests/fixtures.rs
src/worker/api/tests/url_construction.rs
src/worker/api/tests/remote_tool_catalog.rs
src/worker/api/tests/remote_tool_execute.rs
src/worker/api/tests/transport_types.rs
src/worker/api/tests/finish_reason.rs
Add end-to-end hosted remote-tool fidelity tests in the worker container to ensure worker-advertised proxy definitions match orchestrator canonical definitions.
  • Introduced a hosted_fidelity test module that spins up a test catalog server serving a complex ToolDefinition and wires a WorkerRuntime against it.
  • Registered remote tools in the runtime and asserted that the proxy ToolDefinition (name, description, parameters) reconstructed from proxy methods equals the orchestrator’s complex definition.
  • Reused the existing spawn_test_server helper and TestState from worker/container tests to keep server wiring consistent.
src/worker/container/tests/mod.rs
src/worker/container/tests/remote_tools.rs
src/worker/container/tests/hosted_fidelity.rs
Align orchestrator/worker documentation and roadmap to reflect completed hosted-tool schema fidelity and execution routing work and clarified libSQL hybrid search behavior and performance trade-offs.
  • Added a detailed ExecPlan document for roadmap item 1.1.4 describing constraints, risks, milestones, and implemented test matrix, and indexed it from docs/contents.md.
  • Updated RFC 0001 and docs/roadmap.md to mark 1.1.4 as complete and describe the new tests and parity guarantees.
  • Clarified database integration and configuration docs plus CLAUDE and feature parity notes to explain libSQL’s vector_top_k versus brute-force cosine fallback behavior and to emphasize that libSQL’s main difference from PostgreSQL is now performance rather than capability.
docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md
docs/contents.md
docs/rfcs/0001-expose-mcp-tool-definitions.md
docs/roadmap.md
docs/database-integrations.md
docs/configuration-guide.md
src/db/CLAUDE.md
FEATURE_PARITY.md
src/workspace/README.md
Tighten various tool/test utilities to support complex tool descriptions and equality-based assertions on tool outputs and remote-tool transport structures.
  • Changed StubTool.description from &'static str to String and adjusted helpers and tests to accept owned Strings, allowing reuse of complex UTF-8 descriptions from shared builders.
  • Derived PartialEq for RemoteToolExecutionRequest, RemoteToolExecutionResponse, RemoteToolCatalogResponse, and ToolOutput to enable structural equality assertions in transport and proxy tests.
  • Tweaked orchestrator remote-tool tests to construct StubTool instances using owned descriptions and to reuse the new complex_tool_definition/complex_tool_stub fixtures where appropriate.
src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs
src/orchestrator/api/tests/remote_tools.rs
src/tools/tool/traits.rs
src/worker/api/types.rs

Assessment against linked issues

Issue Objective Addressed Explanation
#5 Implement a real libSQL vector search fallback (e.g., brute-force cosine in Rust) when the fixed-dimension vector index is missing after the V9 migration, so hybrid search (FTS + vector) still works.
#5 Align documentation and comments with actual libSQL behavior by clearly describing the post‑V9 hybrid search path and vector fallback, including how libSQL differs from PostgreSQL.
#5 Prevent silent loss of semantic search on libSQL by making vector capability and fallback behavior observable (logging/tests/operator-facing information) so operators can tell whether semantic retrieval is active.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos leynos force-pushed the main branch 2 times, most recently from 368458a to 7e4ec1b Compare March 29, 2026 15:59
@leynos leynos marked this pull request as ready for review March 30, 2026 18:25
sourcery-ai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36a8cf61a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/channels/webhook_server.rs Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/channels/webhook_server.rs (1)

64-71: ⚠️ Potential issue | 🟠 Major

Log the actual bound address, not the configured address.

The code binds to self.config.addr but logs that same value rather than querying listener.local_addr(). If the configured port is 0 (ephemeral), the log message will misleadingly show port 0 instead of the actual bound port. Additionally, current_addr() will return the wrong value.

The AI summary confirms prior behaviour was to overwrite self.config.addr with the resolved address—this removal is a regression for ephemeral port usage.

🐛 Proposed fix to capture the actual bound address
     async fn bind_and_spawn(&mut self, app: Router) -> Result<(), ChannelError> {
         let listener = tokio::net::TcpListener::bind(self.config.addr)
             .await
             .map_err(|e| ChannelError::StartupFailed {
                 name: "webhook_server".to_string(),
                 reason: format!("Failed to bind to {}: {}", self.config.addr, e),
             })?;

-        tracing::info!("Webhook server listening on {}", self.config.addr);
+        let addr = listener.local_addr().map_err(|e| ChannelError::StartupFailed {
+            name: "webhook_server".to_string(),
+            reason: format!("Failed to get local address: {}", e),
+        })?;
+        self.config.addr = addr;
+        tracing::info!("Webhook server listening on {}", addr);

         let (shutdown_tx, shutdown_rx) = oneshot::channel();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/webhook_server.rs` around lines 64 - 71, The log currently
prints self.config.addr (and current_addr()) which is wrong for ephemeral ports;
after binding with tokio::net::TcpListener::bind and obtaining listener, call
listener.local_addr() (or use listener.local_addr().ok()) to get the actual
bound SocketAddr, update self.config.addr (or the struct field used by
current_addr()) with that resolved address, and change the tracing::info message
to log the resolved address instead of the original configured value so
ephemeral ports are reported correctly; locate the logic around listener,
self.config.addr, tracing::info, and current_addr() to apply this change.
docs/database-integrations.md (1)

370-371: ⚠️ Potential issue | 🟡 Minor

Remove or update the stale caveat at line 370–371.

This section states "libSQL migration and schema comments still describe a brute-force vector fallback that the current code does not implement." However, this PR implements exactly that fallback. Either remove this bullet point or update it to reflect that the fallback is now implemented.

Proposed fix
 3. libSQL migration and schema comments still describe a brute-force vector
-   fallback that the current code does not implement.
+   fallback; this is now implemented and documented above.

Alternatively, remove the bullet entirely if it no longer serves as a caveat.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/database-integrations.md` around lines 370 - 371, Update the docs to
remove or revise the stale caveat that claims "libSQL migration and schema
comments still describe a brute-force vector fallback that the current code does
not implement": either delete that bullet point entirely from the document or
reword it to state that the brute-force vector fallback has been implemented in
this PR (mentioning the implemented fallback behavior), ensuring the
documentation matches the current implementation and removes the misleading
warning.
src/db/libsql/workspace.rs (1)

234-304: ⚠️ Potential issue | 🟠 Major

Distinguish missing-index errors from other database faults in vector_ranked_results.

Line 271–298: The function returns VectorSearchOutcome::IndexUnavailable for any indexed-search failure—query errors, row-fetch errors, and missing-index errors alike. This conflation masks genuine database faults (malformed SQL, connection drops, corrupted data) as "index unavailable", misaligns the fallback log messages, and diverges from PostgreSQL, which propagates vector search errors via the ? operator. Inspect libSQL error messages or types to distinguish a missing index from other failures, propagate non-index errors, and return IndexUnavailable only when the vector index is genuinely absent.

Additionally, line 816–821 (hybrid_search): brute-force errors are silently swallowed by .unwrap_or_else(), returning empty results with only a warn-level log. Propagate brute-force search failures instead of masking them.

The test at line 902 uses SearchConfig::default().vector_only() which disables FTS entirely, so it never exercises reciprocal rank fusion with both FTS and vector results. Add a test case that enables both FTS and vector search to verify hybrid fusion succeeds when the vector index is unavailable and brute-force fallback is applied.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/libsql/workspace.rs` around lines 234 - 304, vector_ranked_results
currently treats any query or row-fetch error as
VectorSearchOutcome::IndexUnavailable; change it to inspect the libSQL error
(from conn.query and rows.next) and only return IndexUnavailable when the error
clearly indicates the missing vector index/function (e.g., error message
contains "vector_top_k", "no such function", or the specific index name
'idx_memory_chunks_embedding'); for all other errors propagate them as a
WorkspaceError (i.e., return Err(...)/use ?). Apply the same discrimination to
the rows.next() Err branch. In hybrid_search replace the .unwrap_or_else() that
swallows brute-force failures with proper error propagation so brute-force
search errors bubble up. Finally, add a test (instead of
SearchConfig::default().vector_only()) that enables both FTS and vector search
to verify reciprocal rank fusion behavior when the vector index is absent and
brute-force fallback runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md`:
- Around line 858-861: The markdown entry incorrectly attributes the three new
tests to remote_tools.rs; update the inventory line to reference
catalogue_fidelity.rs instead (replace the filename token `remote_tools.rs` with
`catalogue_fidelity.rs`) while keeping the three test names
(`remote_tool_catalog_preserves_full_tool_definition_payload`,
`remote_tool_catalog_version_is_deterministic_and_sensitive_to_content`,
`orchestrator_responses_deserialize_into_worker_shared_types`) intact so the
ExecPlan accurately points to the changed test file.
- Around line 907-934: Update the "Validation evidence" paragraph so it no
longer claims "all validation gates have been run and passed successfully" while
also noting the documented Markdown lint failures in docs/roadmap.md; edit the
Validation evidence section (the paragraph and the Markdown linting bullet) to
either mark the markdown gate as "partial/blocked" or to explicitly narrow the
success claim to the specific checks that passed (format check, git whitespace,
and full test suite) and state that Markdown linting reported pre-existing
issues in docs/roadmap.md which prevent the docs gate from being fully green.

In `@src/channels/webhook_server.rs`:
- Around line 255-258: The current assertion on old_result is hard to read due
to nested .ok().and_then(...). Replace it with a matches!-based assertion that
directly matches either an Err variant or an Ok variant whose inner result
yields no value (i.e., Err(_) OR Ok(inner) where inner.ok().is_none()), keeping
the original assertion message; update the assert! call that references
old_result to use matches! for clarity.
- Around line 173-176: The current test fixture performs a TOCTOU by binding a
StdTcpListener to "127.0.0.1:0", dropping it, and then starting the server which
can lead to the port being taken; change the approach so the server is the one
that binds to port 0 and then you read back the actual bound port after start()
(or make bind_and_spawn return the actual bound SocketAddr when it creates the
listener), then use server.current_addr() to get the real port for assertions;
update both the code block that creates port with StdTcpListener and the similar
block at lines 222–225 to stop extracting the port from a dropped listener and
instead obtain the bound address from the running server (or have bind_and_spawn
store/return the listener’s local_addr()).

In `@src/db/libsql/workspace.rs`:
- Around line 822-829: In the VectorSearchOutcome::IndexUnavailable branch,
don't swallow errors from self.brute_force_vector_search — replace the current
unwrap_or_else(...) that returns Vec::new() with propagating the error (e.g.,
call .await, map_err to log the warning with the error, then use ? to return the
error) so brute_force_vector_search failures bubble up instead of silently
returning an empty Vec; this touches the VectorSearchOutcome::IndexUnavailable
match arm and the self.brute_force_vector_search(...) call.
- Around line 931-945: The test currently disables FTS by using
SearchConfig::default().vector_only(), so it never hits the hybrid/RRF path in
LibSqlBackend::hybrid_search; change the config to enable both vector and FTS
(e.g., use SearchConfig::default().with_limit(5) instead of .vector_only()), run
hybrid_search, and update the assertions to verify the fused result exposes both
ranks (check results[0].vector_rank and results[0].fts_rank are
present/expected) so the reciprocal-rank-fusion branch is covered.

In `@src/orchestrator/api/tests/catalogue_fidelity.rs`:
- Around line 65-93: The test
remote_tool_catalog_version_is_deterministic_and_sensitive_to_content currently
changes tool identity when comparing versions; instead ensure the tool name
remains identical while only the payload differs: keep registry_a and registry_b
registering the same tool fixture (ToolFixture::CatalogAlpha) and change
registry_c to register a variant that preserves the same tool name but mutates
its description/parameters (e.g., create or use a fixture like
CatalogAlphaWithDifferentPayload or modify build_tool_fixture to accept an
override for description/parameters) before calling hosted_remote_tool_catalog
for version_c; assert version_a == version_b and version_a != version_c so the
catalog version is sensitive to payload changes but not tool name changes.

In `@src/test_support.rs`:
- Line 12: Replace the US spelling "serialization" with en-GB "serialisation" in
the comment that reads "JSON Schema features to validate that tool definitions
survive serialization" so it becomes "JSON Schema features to validate that tool
definitions survive serialisation" to conform to the en-GB-oxendict spelling
guideline; update that comment text wherever the exact phrase appears in the
file (e.g., above the JSON Schema features description).

In `@src/workspace/README.md`:
- Line 87: The markdown bullet "libSQL: FTS5 plus vector search; uses
`vector_top_k(...)` when a compatible fixed-dimension index exists, otherwise
brute-force cosine similarity in Rust" exceeds the 80-column wrapping rule; edit
the README bullet (the line referencing libSQL, FTS5, and `vector_top_k(...)`)
to insert line breaks so the paragraph is wrapped to 80 columns (split the
sentence into multiple wrapped lines at natural breaks, e.g., after "FTS5 plus
vector search;" and before "otherwise brute-force...") while preserving the same
text and inline code.

---

Outside diff comments:
In `@docs/database-integrations.md`:
- Around line 370-371: Update the docs to remove or revise the stale caveat that
claims "libSQL migration and schema comments still describe a brute-force vector
fallback that the current code does not implement": either delete that bullet
point entirely from the document or reword it to state that the brute-force
vector fallback has been implemented in this PR (mentioning the implemented
fallback behavior), ensuring the documentation matches the current
implementation and removes the misleading warning.

In `@src/channels/webhook_server.rs`:
- Around line 64-71: The log currently prints self.config.addr (and
current_addr()) which is wrong for ephemeral ports; after binding with
tokio::net::TcpListener::bind and obtaining listener, call listener.local_addr()
(or use listener.local_addr().ok()) to get the actual bound SocketAddr, update
self.config.addr (or the struct field used by current_addr()) with that resolved
address, and change the tracing::info message to log the resolved address
instead of the original configured value so ephemeral ports are reported
correctly; locate the logic around listener, self.config.addr, tracing::info,
and current_addr() to apply this change.

In `@src/db/libsql/workspace.rs`:
- Around line 234-304: vector_ranked_results currently treats any query or
row-fetch error as VectorSearchOutcome::IndexUnavailable; change it to inspect
the libSQL error (from conn.query and rows.next) and only return
IndexUnavailable when the error clearly indicates the missing vector
index/function (e.g., error message contains "vector_top_k", "no such function",
or the specific index name 'idx_memory_chunks_embedding'); for all other errors
propagate them as a WorkspaceError (i.e., return Err(...)/use ?). Apply the same
discrimination to the rows.next() Err branch. In hybrid_search replace the
.unwrap_or_else() that swallows brute-force failures with proper error
propagation so brute-force search errors bubble up. Finally, add a test (instead
of SearchConfig::default().vector_only()) that enables both FTS and vector
search to verify reciprocal rank fusion behavior when the vector index is absent
and brute-force fallback runs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 63144539-4d98-48f0-a98b-d8ad356f84d1

📥 Commits

Reviewing files that changed from the base of the PR and between 7e4ec1b and 36a8cf6.

📒 Files selected for processing (33)
  • FEATURE_PARITY.md
  • docs/configuration-guide.md
  • docs/contents.md
  • docs/database-integrations.md
  • docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md
  • docs/rfcs/0001-expose-mcp-tool-definitions.md
  • docs/roadmap.md
  • src/channels/webhook_server.rs
  • src/db/CLAUDE.md
  • src/db/libsql/workspace.rs
  • src/lib.rs
  • src/orchestrator/api/tests.rs
  • src/orchestrator/api/tests/catalogue_fidelity.rs
  • src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs
  • src/orchestrator/api/tests/remote_tools.rs
  • src/orchestrator/api/tests/transport_parity.rs
  • src/test_support.rs
  • src/tools/builtin/worker_remote_tool_proxy.rs
  • src/tools/builtin/worker_remote_tool_proxy/tests.rs
  • src/tools/tool/traits.rs
  • src/worker/api/tests.rs
  • src/worker/api/tests/finish_reason.rs
  • src/worker/api/tests/fixtures.rs
  • src/worker/api/tests/mod.rs
  • src/worker/api/tests/remote_tool_catalog.rs
  • src/worker/api/tests/remote_tool_execute.rs
  • src/worker/api/tests/transport_types.rs
  • src/worker/api/tests/url_construction.rs
  • src/worker/api/types.rs
  • src/worker/container/tests/hosted_fidelity.rs
  • src/worker/container/tests/mod.rs
  • src/worker/container/tests/remote_tools.rs
  • src/workspace/README.md
💤 Files with no reviewable changes (1)
  • src/worker/api/tests.rs

Comment thread docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md Outdated
Comment thread docs/execplans/1-1-4-tests-for-schema-fidelity-and-execution-routing.md Outdated
Comment thread src/channels/webhook_server.rs Outdated
Comment thread src/channels/webhook_server.rs
Comment thread src/db/libsql/workspace.rs Outdated
Comment thread src/orchestrator/api/tests/catalogue_fidelity.rs
Comment thread src/test_support.rs Outdated
Comment thread src/workspace/README.md Outdated
@github-actions github-actions Bot added scope: tool/builtin Built-in tools scope: orchestrator Container orchestrator scope: worker Container worker size: XL 500+ changed lines and removed size: M 50-199 changed lines labels Apr 13, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/channels/webhook_server.rs (2)

98-105: ⚠️ Potential issue | 🟠 Major

Add direct behavioural coverage for the start() ephemeral-bind path.

Exercise the exact change at Line 100 through start() with 127.0.0.1:0, then assert current_addr().port() != 0 and /health responds on that resolved address. Keep this separate from start_with_listener tests so the production bind path is validated.

Patch sketch
+    #[rstest]
+    #[tokio::test]
+    async fn test_start_resolves_ephemeral_addr() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
+        let mut server = WebhookServer::new(WebhookServerConfig {
+            addr: "127.0.0.1:0".parse()?,
+        });
+        server.add_routes(Router::new().route(
+            "/health",
+            axum::routing::get(|| async { Json(json!({"status": "ok"})) }),
+        ));
+
+        server.start().await?;
+        let addr = server.current_addr();
+        assert_ne!(addr.port(), 0, "Bound port must be resolved from :0");
+
+        let response = reqwest::Client::new()
+            .get(format!("http://{}/health", addr))
+            .send()
+            .await?;
+        assert_eq!(response.status(), 200);
+
+        server.shutdown().await;
+        Ok(())
+    }

As per coding guidelines, "All new functionality or changes must be guarded by unit tests" and "All new functionality or changes must be guarded by behavioural tests where applicable".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/webhook_server.rs` around lines 98 - 105, Add a behavioural test
that calls WebhookServer::start() with config.addr set to "127.0.0.1:0"
(ephemeral bind), then assert that server.current_addr().port() != 0 and that an
HTTP GET to "{current_addr()}/health" returns a healthy response; keep this test
separate from existing start_with_listener tests and ensure it actually runs the
production bind path exercised by the code that resolves listener.local_addr()
in start().

277-313: ⚠️ Potential issue | 🟡 Minor

Align test naming and assertion text with the exercised API.

Rename the test and message to restart_with_listener semantics. Keep diagnostics precise.

Patch sketch
-    async fn test_restart_with_addr_rebinds_listener(
+    async fn test_restart_with_listener_rebinds_listener(
@@
-            "Address should change after restart_with_addr"
+            "Address should change after restart_with_listener"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/channels/webhook_server.rs` around lines 277 - 313, Rename the test
function test_restart_with_addr_rebinds_listener to reflect the exercised API
(e.g., test_restart_with_listener) and update the assertion messages to
reference restart_with_listener semantics: change the initial address assertion
message to mention "before restart_with_listener", update the health-check
assertion message to "Server should respond to health check before
restart_with_listener", and change the final address-change assertion message
from "Address should change after restart_with_addr" to "Address should change
after restart_with_listener" so diagnostics match the actual method under test
(server.restart_with_listener).
♻️ Duplicate comments (2)
src/db/libsql/workspace.rs (2)

823-829: ⚠️ Potential issue | 🟠 Major

Propagate brute-force fallback failures.

Line 826 swallows fallback errors and returns an empty vector, which reintroduces silent semantic degradation and hides operational faults.

🔧 Bubble the fallback error up
                     VectorSearchOutcome::IndexUnavailable => {
                         tracing::info!("Using brute-force vector search (no vector index)");
                         self.brute_force_vector_search(user_id, agent_id, emb, pre_limit as usize)
                             .await
-                            .unwrap_or_else(|e| {
-                                tracing::warn!("Brute-force vector search failed: {e}");
-                                Vec::new()
-                            })
+                            .map_err(|e| {
+                                tracing::warn!("Brute-force vector search failed: {e}");
+                                e
+                            })?
                     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/libsql/workspace.rs` around lines 823 - 829, The fallback branch
currently swallows errors from self.brute_force_vector_search and returns an
empty Vec, masking failures; instead propagate the error to the caller by
removing the unwrap_or_else that returns Vec::new and use proper error
propagation (e.g., await and ? or map_err) so that brute_force_vector_search
failures bubble up from the current function (the call site using
self.brute_force_vector_search(user_id, agent_id, emb, pre_limit as usize)).
Ensure the surrounding function's signature returns a compatible Result so the
propagated error is handled by callers.

940-948: 🛠️ Refactor suggestion | 🟠 Major

Exercise the hybrid/RRF branch in this fallback test.

Line 940 uses vector_only(), so the test never validates fused ranking behaviour after fallback. Enable both FTS and vector in this test and assert both rank channels on the fused result.

🧪 Extend coverage to fusion semantics
                 embedding: Some(&[1.0, 0.0, 0.0]),
-                config: &SearchConfig::default().vector_only().with_limit(5),
+                config: &SearchConfig::default().with_limit(5),
             })
             .await
             .expect("failed to execute hybrid search");
@@
         assert_eq!(results.len(), 1);
         assert_eq!(results[0].document_path, "notes/search.md");
         assert_eq!(results[0].vector_rank, Some(1));
-        assert!(results[0].fts_rank.is_none());
+        assert_eq!(results[0].fts_rank, Some(1));
As per coding guidelines: "All new functionality or changes must be guarded by behavioural tests where applicable."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/libsql/workspace.rs` around lines 940 - 948, The test currently forces
vector-only by calling SearchConfig::default().vector_only(), so it never
exercises the hybrid/RRF fusion path; remove or replace vector_only() so the
SearchConfig enables both FTS and vector (e.g., use
SearchConfig::default().with_limit(5) or an explicit hybrid config), then update
the assertions on results (the results variable) to check that the fused result
contains both rank channels (assert results[0].vector_rank.is_some() and assert
results[0].fts_rank.is_some(), and optionally check their expected values) so
the fused ranking behaviour after fallback is validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/channels/webhook_server.rs`:
- Around line 176-179: Update the Rustdoc for the current_addr method on the
webhook server to explicitly state its contract: before the first successful
start/restart it returns the configured address (self.config.addr) and not a
live bound address; after a successful start/restart it will reflect the actual
bound address if different. Edit the doc comment above pub fn
current_addr(&self) -> SocketAddr to remove ambiguity and make this behavior
explicit, referencing self.config.addr and the start/restart lifecycle so
readers know when the value is only the configured address versus a bound
address.

In `@src/db/libsql/workspace.rs`:
- Around line 299-304: The current catch-all converts any vector query error
into VectorSearchOutcome::IndexUnavailable (seen at the tracing::debug(...)
block), which masks real DB failures; change the error handling to inspect and
match the query error's concrete kind/variant (e.g., index-not-present /
capability-missing variants returned by the underlying query call) and only map
those specific cases to VectorSearchOutcome::IndexUnavailable, while propagating
all other errors (return Err(e) or use ? with context) so genuine DB faults
surface instead of being downgraded; update the tracing::debug call to run only
for the index-unavailable branch.

---

Outside diff comments:
In `@src/channels/webhook_server.rs`:
- Around line 98-105: Add a behavioural test that calls WebhookServer::start()
with config.addr set to "127.0.0.1:0" (ephemeral bind), then assert that
server.current_addr().port() != 0 and that an HTTP GET to
"{current_addr()}/health" returns a healthy response; keep this test separate
from existing start_with_listener tests and ensure it actually runs the
production bind path exercised by the code that resolves listener.local_addr()
in start().
- Around line 277-313: Rename the test function
test_restart_with_addr_rebinds_listener to reflect the exercised API (e.g.,
test_restart_with_listener) and update the assertion messages to reference
restart_with_listener semantics: change the initial address assertion message to
mention "before restart_with_listener", update the health-check assertion
message to "Server should respond to health check before restart_with_listener",
and change the final address-change assertion message from "Address should
change after restart_with_addr" to "Address should change after
restart_with_listener" so diagnostics match the actual method under test
(server.restart_with_listener).

---

Duplicate comments:
In `@src/db/libsql/workspace.rs`:
- Around line 823-829: The fallback branch currently swallows errors from
self.brute_force_vector_search and returns an empty Vec, masking failures;
instead propagate the error to the caller by removing the unwrap_or_else that
returns Vec::new and use proper error propagation (e.g., await and ? or map_err)
so that brute_force_vector_search failures bubble up from the current function
(the call site using self.brute_force_vector_search(user_id, agent_id, emb,
pre_limit as usize)). Ensure the surrounding function's signature returns a
compatible Result so the propagated error is handled by callers.
- Around line 940-948: The test currently forces vector-only by calling
SearchConfig::default().vector_only(), so it never exercises the hybrid/RRF
fusion path; remove or replace vector_only() so the SearchConfig enables both
FTS and vector (e.g., use SearchConfig::default().with_limit(5) or an explicit
hybrid config), then update the assertions on results (the results variable) to
check that the fused result contains both rank channels (assert
results[0].vector_rank.is_some() and assert results[0].fts_rank.is_some(), and
optionally check their expected values) so the fused ranking behaviour after
fallback is validated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 931f61b6-ce53-45d7-8bef-74546c405b79

📥 Commits

Reviewing files that changed from the base of the PR and between 36a8cf6 and 535a2a0.

📒 Files selected for processing (2)
  • src/channels/webhook_server.rs
  • src/db/libsql/workspace.rs

Comment thread src/channels/webhook_server.rs Outdated
Comment thread src/db/libsql/workspace.rs Outdated
@lodyai lodyai Bot force-pushed the libsql-brute-force-vector-fallback-63cr2c branch from 535a2a0 to 2083989 Compare April 13, 2026 19:05
@github-actions github-actions Bot added size: L 200-499 changed lines and removed size: XL 500+ changed lines labels Apr 13, 2026
codescene-delta-analysis[bot]

This comment was marked as outdated.

@coderabbitai coderabbitai Bot added the Issue label Apr 13, 2026
leynos added 5 commits April 15, 2026 23:51
Add colocated module tests for the libSQL workspace helpers and\nexpand the developer guide with the internal patterns that now\nshape this backend.\n\nThis closes the coverage gap called out in review without growing the\ncentral integration test file further, and it documents the shared\ndatabase handle, workspace module layout, and parameter-object pattern\nthat recent refactors introduced.
Add the missing central workspace-store tests for invalid chunk\nUUID handling, negative chunk indices, and document-not-found\nlookups.\n\nThis keeps the review-requested coverage in workspace/tests.rs\nwithout changing production code, and documents that the existing\nvector-index fallback test also covers the IndexUnavailable path.
Reduce large assertion blocks in the central libSQL workspace tests\nby extracting shared assertion helpers for embedding comparisons\nand single-result hybrid-search checks. This keeps the test\nbehaviour unchanged while addressing the CodeScene finding.
Reduce duplication in the central libSQL workspace tests by\nextracting shared backend setup and DocumentNotFound\nassertion helpers. This keeps the existing test coverage and\nbehaviour intact while addressing the repeated test boilerplate.
Restore the libsql backend module shape that this branch expects after\nrebasing onto origin/main, and add the missing terminal persistence\nforwarder required by the updated NativeDatabase trait.\n\nAlso expose the webhook listener test helpers under the test-helpers\nfeature so integration tests can keep using the pre-bound listener path\nafter the rebase.
@lodyai lodyai Bot force-pushed the libsql-brute-force-vector-fallback-63cr2c branch from 4ae5d07 to ee23d0d Compare April 15, 2026 21:59
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 15, 2026

@coderabbitai Please investigate the following test failure in the Windows branch of the CI matrix, identify a fix and provide an AI coding agent prompt for the fix:

error[E0599]: no method named `start_with_listener` found for struct `ironclaw::channels::WebhookServer` in the current scope
  --> tests\support\webhook_helpers.rs:45:12
   |
45 |     server.start_with_listener(listener).await?;
   |            ^^^^^^^^^^^^^^^^^^^ method not found in `ironclaw::channels::WebhookServer`

For more information about this error, try `rustc --explain E0599`.
error: could not compile `ironclaw` (test "trace_llm_tests") due to 1 previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `ironclaw` (test "e2e_traces") due to 1 previous error
error: could not compile `ironclaw` (test "tools_and_config") due to 1 previous error

@coderabbitai

This comment was marked as resolved.

Make start_with_listener and restart_with_listener available\nwithout the test-helpers cfg gate so integration-test support code\ncan call them reliably when ironclaw is compiled as a library\ndependency, including on the Windows CI matrix.
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 16, 2026

@coderabbitai Please suggest a fix for the following concern and provide an AI coding agent prompt for the fix:

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Testing ⚠️ Warning New functionality across vector_search.rs (300 lines), document_ops.rs (408 lines), chunk_ops.rs (163 lines), and fts.rs lacks comprehensive unit test coverage within respective modules; only 177 lines of centralised tests.rs exists; critical error paths untested. Add module-level unit tests for vector_search.rs, document_ops.rs, chunk_ops.rs, and fts.rs covering success and error paths; create integration tests for hybrid_search pipeline; add tests for embedding serialisation round-trips and UUID parsing edge cases.
Developer Documentation ⚠️ Warning The PR introduces significant undocumented internal APIs including LibSqlDatabase wrapper, changed shared_db() return types, workspace store module architecture, and parameter-object structs across multiple store implementations. Expand docs/developers-guide.md with subsections documenting LibSqlDatabase wrapper usage, workspace store implementation patterns, parameter-object structs for function arity reduction, and concrete examples of type changes propagating through store constructors.

@coderabbitai

This comment was marked as resolved.

Expand the developers guide with concrete examples showing how the\nArc<libsql::Database> to Arc<LibSqlDatabase> change propagates\nthrough store constructors and shared_db() consumers.
codescene-delta-analysis[bot]

This comment was marked as outdated.

Expand the central libsql workspace test module with round-trip\nchunk operations and document-store success-path coverage, using\nsetup_backend for all async cases.
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 16, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph.

If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced.

src/db/libsql/workspace/tests.rs

Comment on lines +339 to +379

async fn insert_chunk_and_delete_chunks_round_trip() {
    let backend = setup_backend().await;

    let document = backend
        .get_or_create_document_by_path("default", None, "notes/chunks.md")
        .await
        .expect("failed to create document");

    let chunk_id = backend
        .insert_chunk(InsertChunkParams {
            document_id: document.id,
            chunk_index: 0,
            content: "round-trip chunk",
            embedding: None,
        })
        .await
        .expect("failed to insert chunk");

    let before = backend
        .get_chunks_without_embeddings("default", None, 10)
        .await
        .expect("failed to list chunks before delete");
    assert!(
        before.iter().any(|c| c.id == chunk_id),
        "inserted chunk must appear in get_chunks_without_embeddings"
    );

    backend
        .delete_chunks(document.id)
        .await
        .expect("failed to delete chunks");

    let after = backend
        .get_chunks_without_embeddings("default", None, 10)
        .await
        .expect("failed to list chunks after delete");
    assert!(
        after.iter().all(|c| c.id != chunk_id),
        "deleted chunk must not appear after delete_chunks"
    );
}

❌ New issue: Code Duplication
The module contains 4 functions with similar structure: insert_chunk_and_delete_chunks_round_trip,list_all_paths_returns_inserted_document_path,list_documents_returns_inserted_document,update_chunk_embedding_is_reflected_in_chunks_list

@coderabbitai

This comment was marked as resolved.

Reduce duplicated document setup in the central libsql workspace\ntests by routing the repeated default-scope creation path through a\nprivate create_test_document helper.
codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Apr 16, 2026

@coderabbitai Please suggest a fix for this issue and supply a prompt for an AI coding agent to enable it to apply the fix. Include the file and symbol names indicated in the issue at the head of your response. Ensure that this is validated against the current version of the codegraph.

If further refinement to address this finding would be deleterious, please supply a clear explanatory one to two paragraph markdown message I can paste into the CodeScene web ui's diagnostic suppression function so this diagnostic can be silenced.

Code Duplication

src/db/libsql/workspace/tests.rs:

What lead to degradation?

The module contains 4 functions with similar structure: insert_chunk_and_delete_chunks_round_trip,list_all_paths_returns_inserted_document_path,list_documents_returns_inserted_document,update_chunk_embedding_is_reflected_in_chunks_list

Why does this problem occur?

Duplicated code often leads to code that's harder to change since the same logical change has to be done in multiple functions. More duplication gives lower code health.

How to fix it?

A certain degree of duplicated code might be acceptable. The problems start when it is the same behavior that is duplicated across the functions in the module, ie. a violation of the Don't Repeat Yourself (DRY) principle. DRY violations lead to code that is changed together in predictable patterns, which is both expensive and risky. DRY violations can be identified using CodeScene's X-Ray analysis to detect clusters of change coupled functions with high code similarity. Read More
Once you have identified the similarities across functions, look to extract and encapsulate the concept that varies into its own function(s). These shared abstractions can then be re-used, which minimizes the amount of duplication and simplifies change.

@coderabbitai

This comment was marked as resolved.

codescene-delta-analysis[bot]

This comment was marked as outdated.

@leynos

This comment was marked as resolved.

@coderabbitai

This comment was marked as resolved.

Add a unit test for embedding_to_vector_json and expose the helper to
sibling workspace tests via pub(super).

The requested cargo test target names a non-existent package in this
workspace, so I also validated the focused test against the real ironclaw
crate.
Copy link
Copy Markdown

@codescene-delta-analysis codescene-delta-analysis Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gates Passed
6 Quality Gates Passed

See analysis details in CodeScene

Absence of Expected Change Pattern

  • axinite/src/db/mod.rs is usually changed with: axinite/src/db/postgres.rs

Quality Gate Profile: Pay Down Tech Debt
Install CodeScene MCP: safeguard and uplift AI-generated code. Catch issues early with our IDE extension and CLI tool.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 16, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== tests.rs: embedding_to_vector_json test presence ==="
grep -n "embedding_to_vector_json\|vector_json" src/db/libsql/workspace/tests.rs | head -20

echo ""
echo "=== tests.rs: all test names and line count ==="
grep -n "#\[test\]\|#\[tokio::test\]" src/db/libsql/workspace/tests.rs
wc -l src/db/libsql/workspace/tests.rs

echo ""
echo "=== vector_search.rs: visibility of embedding_to_vector_json ==="
grep -n "embedding_to_vector_json\|pub.*embedding_to_vector" src/db/libsql/workspace/vector_search.rs | head -10

echo ""
echo "=== developers-guide.md: Type-change propagation section ==="
grep -n "Type-change\|type-change\|LibSqlSecretsStore\|LibSqlWasmChannelStore\|LibSqlWasmToolStore\|busy_timeout\|store constructor" docs/developers-guide.md | head -20

echo ""
echo "=== developers-guide.md: all section headings ==="
grep -n "^###\|^##\|^#" docs/developers-guide.md | head -60

Length of output: 3102


@leynos — both previously-failed checks are now fully resolved:

Check Status Evidence
Testing ✅ Resolved test_embedding_to_vector_json_formats_floats_as_json_array is present at line 127 of tests.rs; embedding_to_vector_json is correctly pub(super) in vector_search.rs (line 207); tests.rs now totals 600 lines with 21 tests covering all required paths across vector_search.rs, document_ops.rs, chunk_ops.rs, and the hybrid pipeline
Developer Documentation ✅ Resolved docs/developers-guide.md contains all required sections: ### libSQL test databases (line 322), ### LibSqlDatabase shared handles (line 336), ### Type-change propagation through store constructors (line 353) — including the three-store constructor table and the busy_timeout centralisation note — ### Workspace store module structure (line 806), ### Parameter-object structs in store helpers (line 861), and ### WebhookServer test helpers (line 790)

No further work is required for either check.

@leynos leynos merged commit d3be107 into main Apr 16, 2026
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: high Safety, secrets, auth, or critical infrastructure scope: channel/wasm WASM channel runtime scope: db Database trait / abstraction scope: docs Documentation scope: orchestrator Container orchestrator scope: secrets Secrets management scope: tool/builtin Built-in tools scope: tool/wasm WASM tool sandbox scope: worker Container worker scope: workspace Persistent memory / workspace size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

libSQL workspace search silently degrades to FTS-only after flexible-dimension migration while PostgreSQL still performs hybrid FTS plus vector search

1 participant