Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
a9dbece
feat(db/libsql): fallback to brute-force cosine similarity if vector …
leynos Mar 29, 2026
a32cb77
fix: restore webhook listener test safety
leynos Apr 13, 2026
6b13345
fix: restore webhook test listener import
leynos Apr 13, 2026
b95bf7d
Fix verified review findings
leynos Apr 13, 2026
a4f99cc
Clarify webhook address contract
leynos Apr 13, 2026
2976267
Cover production webhook restart path
leynos Apr 13, 2026
97e2e85
Fix libsql in-memory workspace tests
leynos Apr 13, 2026
51bf6b9
Extract libsql vector search helpers
leynos Apr 14, 2026
090aa6f
Extract libsql backend constructor helpers
leynos Apr 14, 2026
a44907a
Fix database integrations wording
leynos Apr 14, 2026
5d74e76
Fix webhook bind state and libsql cleanup ownership
leynos Apr 14, 2026
dad7c16
Test libsql temp-file cleanup
leynos Apr 14, 2026
755993e
Document workspace memory search modes
leynos Apr 14, 2026
82b0bd4
Clarify temp-file-backed libsql testing docs
leynos Apr 14, 2026
63919d7
Fix workspace search docs wording
leynos Apr 14, 2026
315913e
Unify libsql shared-handle connection setup
leynos Apr 14, 2026
2bbdc2d
Split libsql workspace module
leynos Apr 14, 2026
c344d3b
Introduce AgentScope for libsql documents
leynos Apr 14, 2026
24fb21b
Extract libsql document row helpers
leynos Apr 14, 2026
5c19d39
Introduce FtsSearchParams
leynos Apr 14, 2026
920f5d4
Introduce VectorSearchQuery
leynos Apr 14, 2026
159f419
Extract list_directory helpers
leynos Apr 14, 2026
1f6965d
Introduce VectorIndexQuery
leynos Apr 14, 2026
f214960
Clarify libsql temp-file cleanup docs
leynos Apr 14, 2026
6d7aff6
Fix libsql workspace search edge cases
leynos Apr 14, 2026
afa9d8f
Skip invalid UUID rows in vector index results
leynos Apr 14, 2026
4f1a4a6
Extract document row mapping helper
leynos Apr 14, 2026
1e9d35d
Extract libsql hybrid search helpers
leynos Apr 14, 2026
70384ff
Harden libsql workspace row parsing
leynos Apr 14, 2026
7b269bf
Extract libsql chunk row parser
leynos Apr 15, 2026
a54340e
Add libsql workspace module test coverage
leynos Apr 15, 2026
54ad178
Add central libsql workspace error-path tests
leynos Apr 15, 2026
d6f0e13
Extract libsql workspace assertion helpers
leynos Apr 15, 2026
32cd332
Extract libsql workspace test setup helpers
leynos Apr 15, 2026
ee23d0d
Fix rebased libsql and webhook helpers
leynos Apr 15, 2026
d2a7a0b
Remove webhook listener cfg gates
leynos Apr 16, 2026
c779cfb
Document libsql shared-handle constructor changes
leynos Apr 16, 2026
a573dda
Add libsql workspace success-path tests
leynos Apr 16, 2026
32cadb2
Extract libsql workspace test document helper
leynos Apr 16, 2026
71fb9bd
Test libsql embedding JSON helper
leynos Apr 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion FEATURE_PARITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,7 +336,7 @@ This document tracks feature parity between IronClaw (Rust implementation) and O

| Feature | OpenClaw | IronClaw | Notes |
|---------|----------|----------|-------|
| Vector memory | ✅ | ✅ | pgvector |
| Vector memory | ✅ | ✅ | PostgreSQL uses pgvector; libSQL uses indexed vector search when available and brute-force cosine fallback after V9 |
| Session-based memory | ✅ | ✅ | |
| Hybrid search (BM25 + vector) | ✅ | ✅ | RRF algorithm |
| Temporal decay (hybrid search) | ✅ | ❌ | Opt-in time-based scoring factor |
Expand Down
7 changes: 7 additions & 0 deletions docs/configuration-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,13 @@ Table 18. Database and secrets environment variables.
| `LIBSQL_AUTH_TOKEN` | Auth token for `LIBSQL_URL`. | Required when `LIBSQL_URL` is set. |
| `SECRETS_MASTER_KEY` | Master key for encrypted secrets storage. | Optional, but must be at least 32 bytes when set. If omitted, axinite falls back to the operating-system keychain when available. |

When workspace memory search is enabled, backend choice affects how semantic
retrieval runs. PostgreSQL uses pgvector cosine-distance queries. libSQL uses
indexed `vector_top_k(...)` only when a compatible fixed-dimension vector
index exists; otherwise it falls back to brute-force cosine similarity in
Rust. `ironclaw doctor` and `ironclaw status` surface the active search mode.
See [database integrations](database-integrations.md) for the backend trade-offs.

### 4.3 Agent runtime, safety, routines, heartbeat, hygiene, skills, and builder mode

Table 19. Core runtime behaviour variables.
Expand Down
52 changes: 29 additions & 23 deletions docs/database-integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,8 +226,9 @@ explicit `BEGIN IMMEDIATE` block.

### 4.4 Workspace search path

libSQL mirrors the same workspace concepts as PostgreSQL, but the implemented
search path is narrower.
libSQL mirrors the same workspace concepts as PostgreSQL, but the vector path
uses a different implementation strategy after the flexible-dimension
migration.

Table 3. libSQL workspace search components.

Expand All @@ -236,28 +237,32 @@ Table 3. libSQL workspace search components.
| Document store | `memory_documents` table |
| Chunk store | `memory_chunks` table with `embedding BLOB` |
| Full-text search | `memory_chunks_fts` FTS5 virtual table plus maintenance triggers |
| Semantic search | Best-effort `vector_top_k(...)` query when a compatible vector index exists |
| Semantic search | `vector_top_k(...)` when a compatible index exists, otherwise brute-force cosine similarity in Rust |
| Fusion strategy | RRF in Rust, same as PostgreSQL |

The current implementation quirk is important:
The important implementation detail is how libSQL behaves after the V9
flexible-dimension migration:

- the libSQL schema and V9 migration comments describe a brute-force vector
fallback after flexible dimensions remove the fixed-dimension index
- the live code in `src/db/libsql/workspace.rs` does not implement that
brute-force fallback
- instead it attempts `vector_top_k('idx_memory_chunks_embedding', ...)`
and, when that query fails as expected after V9 drops the index, it logs a
debug message and returns no vector results
- the fixed-dimension `libsql_vector_idx` index is dropped because it cannot
support arbitrary embedding lengths
- the live code first attempts `vector_top_k('idx_memory_chunks_embedding', ...)`
- when that indexed query is unavailable, the repository logs that it is using
brute-force vector search and computes cosine similarity in Rust across the
stored embedding blobs
- the result stream still feeds the same Reciprocal Rank Fusion (RRF) path as
PostgreSQL

In practical terms, a migrated or freshly bootstrapped libSQL workspace
currently behaves as:

- FTS5 keyword search always available
- vector results only when a compatible vector index exists
- FTS-only search after the normal flexible-dimension migration path
- semantic retrieval still available after V9 through brute-force cosine
similarity
- hybrid search still combines keyword and semantic results, but libSQL pays a
linear scan cost where PostgreSQL can use pgvector operations

That is the most significant behavioural gap between the two backends in the
current code.
That means the main backend difference is now performance and implementation
strategy, not silent loss of semantic recall.
Comment thread
coderabbitai[bot] marked this conversation as resolved.

### 4.5 Satellite stores on libSQL

Expand Down Expand Up @@ -330,14 +335,15 @@ Table 4. Current backend comparison.
| Migration engine | `refinery` over numbered SQL files | Consolidated schema plus `_migrations`-tracked incremental Rust-side runner |
| Secrets and WASM satellite stores | Reuse cloned pool handles | Reuse shared database handle, then open fresh connections |
| Keyword search | PostgreSQL `tsvector` plus GIN | FTS5 virtual table plus triggers |
| Vector search | pgvector cosine distance, now without the old HNSW index after V9 | Best-effort only; current migrated path is effectively FTS-only |
| Vector search | pgvector cosine distance, now without the old HNSW index after V9 | Indexed `vector_top_k(...)` when available, otherwise brute-force cosine similarity in Rust |
| Best fit | Full default deployment with richer search parity | Embedded, local-first, or low-ops deployment where external PostgreSQL is undesirable |

### 6.1 When PostgreSQL is the safer choice

Choose PostgreSQL when:

- full hybrid workspace search quality matters
- full hybrid workspace search quality and search latency under larger
workspaces matter
- the deployment already has PostgreSQL 15+ with pgvector available
- query behaviour should match the default and most-tested path as closely as
possible
Expand All @@ -350,10 +356,9 @@ Choose libSQL when:
- the deployment is local-first or edge-style
- Turso replica mode is desirable, but a full PostgreSQL service is not

The main caveat is the current workspace-search trade-off. libSQL is not
merely "the same database API with a different wire protocol". In the current
implementation it is a simpler persistence backend with weaker semantic-search
behaviour after the flexible-dimension migration path.
The main caveat is now performance rather than capability. libSQL still offers
hybrid retrieval, but its post-V9 semantic path can require a brute-force scan
over all candidate embeddings for that workspace scope.

## 7. Current implementation caveats

Expand All @@ -362,8 +367,9 @@ behaviour after the flexible-dimension migration path.
handshake.
2. PostgreSQL still supports vector search after V9, but no longer through the
old fixed-dimension HNSW index.
3. libSQL migration and schema comments still describe a brute-force vector
fallback that the current code does not implement.
3. libSQL falls back to brute-force cosine similarity in Rust when
`vector_top_k(...)` cannot run because the fixed-dimension vector index is
absent after the V9 migration.
4. Workspace memory and memory tools are absent in `--no-db` mode because the
host does not build a workspace without a database.

Expand Down
107 changes: 107 additions & 0 deletions docs/developers-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,68 @@ Meaning: PostgreSQL connection URL used by the app.
Default or rule:


### libSQL test databases

Unit tests that exercise the libSQL backend call
`LibSqlBackend::new_memory()` rather than `new_local()`. `new_memory()`
creates a UUID-named file in the OS temp directory so that multiple
connections within a single test share state, matching production semantics.
The shared database handle removes that file and its `-wal`/`-shm` sidecars
automatically when the final clone is dropped, so tests should not leave
artefacts behind on disk.

Do **not** use `new_local()` in unit tests; reserve it for integration tests
or tests that specifically require filesystem-path behaviour.


### LibSqlDatabase shared handles

`LibSqlBackend` owns an `Arc<LibSqlDatabase>` rather than a raw libSQL
database handle. That wrapper exists for two reasons:

- satellite stores such as the secrets and WASM stores can call
`shared_db()` and open their own per-operation connections without
reopening a different database
- temp-file-backed test databases created by `new_memory()` keep their
cleanup metadata on the shared handle, so the `.db`, `-wal`, and `-shm`
files live until the final shared owner is dropped

If a constructor or store used to accept a backend directly and now accepts
`Arc<LibSqlDatabase>`, that is usually a signal that it should share the same
underlying file while creating its own connections via
`LibSqlDatabase::connect()`.

### Type-change propagation through store constructors

The `Arc<libsql::Database>` → `Arc<LibSqlDatabase>` change propagates to
every store that previously held a raw `Arc<libsql::Database>`. Each
affected constructor now accepts `Arc<LibSqlDatabase>`:

| Store | Field | Constructor parameter |
| --- | --- | --- |
| `LibSqlSecretsStore` | `db: Arc<LibSqlDatabase>` | `new(db: Arc<LibSqlDatabase>, …)` |
| `LibSqlWasmChannelStore` | `db: Arc<LibSqlDatabase>` | `new(db: Arc<LibSqlDatabase>)` |
| `LibSqlWasmToolStore` | `db: Arc<LibSqlDatabase>` | `new(db: Arc<LibSqlDatabase>)` |

The shared handle is obtained at startup via `LibSqlBackend::shared_db()`,
which now returns `Arc<LibSqlDatabase>` instead of
`Arc<libsql::Database>`:

```rust
// Obtaining the shared handle (unchanged call site):
let db: Arc<LibSqlDatabase> = backend.shared_db();

// Constructing a store with the shared handle:
let secrets_store = LibSqlSecretsStore::new(Arc::clone(&db), crypto);
let channel_store = LibSqlWasmChannelStore::new(Arc::clone(&db));
let tool_store = LibSqlWasmToolStore::new(Arc::clone(&db));
```

The `busy_timeout` PRAGMA that each store previously ran after connecting
is now applied once inside `LibSqlDatabase::connect()`, so it is no longer
necessary — and must not be duplicated — in individual store
`connect()` methods.

## Dispatcher Architecture

The dispatcher orchestrates interactive chat turns by preparing an LLM
Expand Down Expand Up @@ -725,6 +787,40 @@ When those changes land, this guide must be updated in the same branch
so local setup instructions stay truthful.


### WebhookServer test helpers

`WebhookServer` exposes two `#[cfg(test)]`-only methods to eliminate
port-allocation races:

- `start_with_listener(listener: TcpListener)` — accepts a pre-bound
listener, merges queued route fragments, resolves the live listener
address, and spawns the server.
- `restart_with_listener(listener: TcpListener)` — shuts the current server
down, resolves the new listener's address, and spawns a fresh server.

Tests should pre-bind via `TcpListener::bind("127.0.0.1:0")` and pass the
result to these helpers instead of relying on `start()` /
`restart_with_addr()` to pick a free port.


### Workspace store module structure

The libSQL workspace store is split by concern under
`src/db/libsql/workspace/`:

- `mod.rs` owns the `NativeWorkspaceStore` implementation and hybrid-search
orchestration
- `document_ops.rs` owns document CRUD and directory-style listing helpers
- `chunk_ops.rs` owns chunk insertion, embedding updates, and chunk polling
- `fts.rs` owns FTS-only ranking queries
- `vector_search.rs` owns vector-index and brute-force similarity helpers
- `tests.rs` keeps cross-module integration coverage for the hybrid pipeline

Prefer adding logic beside the feature it serves rather than growing
`mod.rs`. Module-local tests should live with the module they exercise, while
pipeline tests belong in `workspace/tests.rs`.


### Key APIs

- `RunLoopCtx`: per-run container that carries the session handle,
Expand Down Expand Up @@ -761,3 +857,14 @@ export DATABASE_URL=postgres://localhost/ironclaw

Adjust the connection string if the local PostgreSQL instance requires a
different host, user, or password.

### Parameter-object structs in store helpers

The workspace helpers use small parameter structs such as `AgentScope`,
`FtsSearchParams`, `VectorSearchQuery`, and `VectorIndexQuery` to keep helper
arity below the repository limit and to make call sites describe intent.

Use this pattern when a helper repeatedly threads the same related values
through several internal calls. Keep these structs private or `pub(super)`
unless a wider API boundary genuinely needs them, and prefer names that
describe the query or scope they model instead of generic `Options` suffixes.
Original file line number Diff line number Diff line change
Expand Up @@ -855,7 +855,7 @@ addressed in a subsequent pass (see progress checklist above).
- `src/orchestrator/api/tests/fixtures/remote_tool_mocks.rs`: added
`complex_tool_definition()` and `complex_tool_stub()` fixtures for testing
full payload fidelity with nested JSON Schema and special characters.
- `src/orchestrator/api/tests/remote_tools.rs`: added three new tests:
- `src/orchestrator/api/tests/catalogue_fidelity.rs`: added three new tests:
`remote_tool_catalog_preserves_full_tool_definition_payload`,
`remote_tool_catalog_version_is_deterministic_and_sensitive_to_content`, and
`orchestrator_responses_deserialize_into_worker_shared_types`.
Expand Down Expand Up @@ -903,9 +903,11 @@ The implementation added 9 new test functions covering:
(milestone 4).

All tests use in-process mock servers and fixtures, avoiding external
dependencies. All tests follow existing `rstest` patterns and naming conventions.
The format check (`make check-fmt`) passed after running `cargo fmt --all`. All
validation gates have been run and passed successfully.
dependencies. All tests follow existing `rstest` patterns and naming
conventions. The format check (`make check-fmt`) passed after running
`cargo fmt --all`. The format check, git whitespace check, and full test suite
passed successfully. Markdown linting remained partially blocked by
pre-existing issues in `docs/roadmap.md`.

### Validation evidence

Expand All @@ -928,10 +930,10 @@ Full test suite passed: 3076 tests passed; 0 failed; 2 ignored (webhook server
test fixed to use already-bound address instead of privileged port; worker API
types test split into three focused tests per code review).

Markdown linting revealed pre-existing issues in `docs/roadmap.md` unrelated to
this implementation (multiple consecutive blank lines at lines 1342, 1408, 1450,
1489, 1512). The ExecPlan, RFC 0001, and `docs/contents.md` changes introduced
no new Markdown issues.
Markdown linting was only partially green because `docs/roadmap.md` still had
pre-existing issues unrelated to this implementation (multiple consecutive
blank lines at lines 1342, 1408, 1450, 1489, 1512). The ExecPlan, RFC 0001,
and `docs/contents.md` changes introduced no new Markdown issues.

### Retrospective observations

Expand Down
26 changes: 26 additions & 0 deletions docs/users-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,29 @@ These notices are advisory. A success message means the runtime moved the job
back into its normal retry path. A permanent failure or manual-intervention
message means the runtime could not finish recovery automatically, and the
operator should inspect the job or tool state before retrying work.


## Workspace memory search

When workspace memory is enabled, the search backend differs by database:

- **PostgreSQL** — performs pgvector cosine-distance queries directly.
- **libSQL / Turso** — attempts an indexed `vector_top_k(...)` query when a
compatible fixed-dimension vector index exists. After the V9 migration
(which removed the fixed-dimension index in favour of flexible-dimension
vector storage, with `memory_chunks.embedding` stored as a
flexible vector), the backend automatically falls back to brute-force
cosine similarity computed in Rust. Results from both paths feed into the
same Reciprocal Rank Fusion (RRF) pipeline, so hybrid full-text search
(FTS) + vector retrieval is preserved.

To determine which search mode is active for a workspace, run:

```text
ironclaw doctor
ironclaw status
```

Both commands report whether indexed or brute-force vector retrieval is
currently in use. See `docs/database-integrations.md` for backend trade-offs
and performance considerations.
15 changes: 5 additions & 10 deletions src/channels/wasm/storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -351,24 +351,19 @@ fn pg_row_to_channel(
/// matching the connection-per-request pattern used by the main `LibSqlBackend`.
#[cfg(feature = "libsql")]
pub struct LibSqlWasmChannelStore {
db: std::sync::Arc<libsql::Database>,
db: std::sync::Arc<crate::db::libsql::LibSqlDatabase>,
}

#[cfg(feature = "libsql")]
impl LibSqlWasmChannelStore {
pub fn new(db: std::sync::Arc<libsql::Database>) -> Self {
pub fn new(db: std::sync::Arc<crate::db::libsql::LibSqlDatabase>) -> Self {
Self { db }
}

async fn connect(&self) -> Result<libsql::Connection, WasmChannelStoreError> {
let conn = self
.db
.connect()
.map_err(|e| WasmChannelStoreError::Database(format!("Connection failed: {}", e)))?;
conn.query("PRAGMA busy_timeout = 5000", ())
.await
.map_err(|e| {
WasmChannelStoreError::Database(format!("Failed to set busy_timeout: {}", e))
let conn =
self.db.connect().await.map_err(|e| {
WasmChannelStoreError::Database(format!("Connection failed: {}", e))
})?;
Ok(conn)
}
Expand Down
Loading
Loading