Skip to content

Notes / reference-text store: deterministic filing cabinet on the document/RAG layer#19

Merged
KerseyFabrications merged 8 commits into
mainfrom
notes_reference_store
Jun 12, 2026
Merged

Notes / reference-text store: deterministic filing cabinet on the document/RAG layer#19
KerseyFabrications merged 8 commits into
mainfrom
notes_reference_store

Conversation

@KerseyFabrications

Copy link
Copy Markdown
Contributor

Summary

Adds a notes / reference-text store — a deterministic "filing cabinet" on top
of the existing document/RAG store — so DAWN can give back exactly what the user
filed under a label, instead of a fuzzy top-K neighbor. Closes the conversation-809
gap where three canonical bios filed under distinct labels were buried under their
own semantic near-twins and couldn't be retrieved cleanly.

A note is just a single-chunk document whose filename is the user's label. The
capability is delivered by a column-weighted BM25 lexical channel (its own
candidate set, fused with semantic) plus surgical in-place editing and full version
history with undo/recover — all folded into the existing document tools, library
UI, and config rather than a new subsystem.

What's included

  • Hybrid lexical + semantic document search (v61). New contentless FTS5 index
    (document_chunks_fts) with separately-weighted label/body columns; BM25 runs as
    an independent candidate set fused with cosine + an ordered/contiguous phrase
    bonus, so an exact-label note ranks first even with weak embedding similarity.
  • Notes as documents. Direct single-chunk insert (bypasses the chunker, gated
    tokens ≤ max), num_chunks == 1 invariant by construction on both the WebUI
    and tool paths.
  • Surgical editing (B1a) + multi-chunk doc editing (B1b, v63). Find/replace
    edit and append via a JSON-object change param; multi-chunk documents store
    canonical full text (document_full_text) and edit in place (re-chunk/re-embed →
    atomic swap, stable doc_id).
  • Version history + undo/recover (B3, v62). Every edit/overwrite/delete first
    archives the prior content (document_versions); recover undoes the last change
    or brings a deleted item back. Retention by age + per-doc cap (both config).
  • Memory↔note bridge + extraction guard (Phase 9). A thin embedded "gloss" fact
    redirects fuzzy "what's my bio?" to the exact note; the canonical body is kept out
    of memory_facts, and the extraction guard redacts filed bodies from session-end
    extraction so reference text isn't duplicated and re-mined.
  • WebUI. Documents | Notes tabbed library with one cross-scope search,
    read-only viewer, inline note editor, version History/Restore, "Recently deleted"
    recovery, and search-weight settings.
  • Admin. dawn-admin FTS rebuild command (v61 recovery path).

Schema migrations

v61 (hybrid FTS index + note_doc_id bridge column) → v62 (document_versions)
v63 (document_full_text). All idempotent, gate-flagged, with fresh-install
SCHEMA_SQL parity. auth_db_schema.c was split first (per the standing TODO):
per-version migration blocks extracted into auth_db_migrations.c.

Review & hardening

The full branch went through per-commit big-three reviews plus a holistic
master-code-review before merge. The final commit (ef9ee51) applies that pass:

  • Security: attribute-context XSS fix on LLM/cross-user-settable labels.
  • Data integrity: ROLLBACK on failed deletes (no orphan FTS/version rows);
    save_text overwrites in place so the version chain + undo survive; recover
    re-points version rows onto the re-created doc (no "Recently deleted" ghost or
    duplicate); skip empty/junk snapshots; correct num_chunks on partial embeds.
  • Crash safety: NULL-check JSON-null type in the extraction guard.
  • Config: wired the note_extraction_guard round-trip that a WebUI save was
    silently dropping; config-example + settings-schema parity.

Owner-scoping verified on every new SQL surface (no IDOR); auth_db leaf-lock
discipline held across all stem-outside-lock sites.

Testing

  • 78/78 CI unit tests green; new/expanded coverage in test_document_search_bm25,
    test_document_manage, test_memory_note_guard, test_memory_note_bridge
    (exact-label rank-1, stable-id edit, version archive/survive-delete/owner-scope/
    cap, full-text round-trip, atomic doc-replace, recover/undo, version reattach,
    num_chunks correction, guard coverage across both provider shapes + all actions).
  • Live-verified end-to-end: the 809 bios reproduction, in-place edits, multi-chunk
    doc edit/restore, undo-a-delete, and the memory→note redirect.

Notes

68 files, +11,226 / −2,724. Design doc archived at
atlas/dawn/archive/NOTES_REFERENCE_STORE_DESIGN.md. Only the B4 structured-records
follow-up is shelved (B1 surgical edit absorbed most of its motivation).

… (v61)

Fixes conv-809: exact reference text (bios, pitches) filed under a label now
retrieves rank-1 instead of being buried by semantic recall. Adds a column-
weighted FTS5/BM25 lexical channel (its own candidate set, fused with semantic
+ ordered/contiguous phrase bonus), single-chunk notes with stable-id edit,
orphan-free indexed delete, a document_manage LLM tool (save/list/delete with
a two-step user-approved deletion), and a WebUI Notes tab. Splits the auth_db
migration ladder into auth_db_migrations.c first. Schema v61 (document_chunks_fts
+ note_doc_id). Bundles a memory get-by-ID action.

Test: tests/test_document_search_bm25.c (the 809 repro) + 74/74 CI green; full
5-agent review applied; live-verified end-to-end (save/overwrite/list/two-step
delete/save_text + WebUI).
…tool hardening

Phase 9 of the notes/reference-text store, plus pre-existing doc/tool-loop bugs
the live test surfaced. No schema change (note_doc_id + FTS shipped in v61).

Extraction guard (memory_note_guard): keeps note-filed reference text — and its
document_read/document_search tool-result echoes — out of session-end fact
extraction, closing a live-proven leak where filed bios were re-mined into facts
and drifted stale. Both provider tool-call shapes; copy-on-redact; config
[memory] note_extraction_guard (default on).

Bridge (memory_note_bridge): one self-directing gloss fact per note, note_doc_id-
linked, so a fuzzy "what's my bio" routes to the verbatim note. Glosses stay in
the embedding cache (retrievable) but are skipped by paraphrase-dedup/find-
duplicates via an s_cache flag, and exempt from prune/decay/find_similar/pattern-
delete. set_note_doc_id ownership-validated; gloss label injection-filtered.
Admin: dawn-admin memory backfill-note-glosses.

Doc/tool-loop fixes: save buffer 4 KB→16 KB (DOCMGMT_SAVE_TEXT_MAX), save_text
overwrite-by-label, dup-tool-call guard scoped to the current turn,
tool_call_t.args_truncated refuses partial-arg execution, document_read/delete
by id.

Tests: test_memory_note_guard (8), test_memory_note_bridge (6), test_llm_dup_check
(4). 77/77 CI, format clean. Big-three reviewed (1 critical cache-flag init +
2 security-medium fixed). Live-verified end-to-end.
…mory, search-weight settings

Document Library follow-ups (notes/reference store backlog A1–A3 + A5):
- Remember active tab across opens (persist state.scope to localStorage).
- Click a note name to open an inline read-only viewer (Close/Edit/Delete),
  keyboard-accessible, nested-Esc aware.
- Sticky panel: drop outside-click auto-close (mirrors the music panel) so an
  in-panel action spawning the confirm modal no longer closes the panel. The
  top-right trio stays mutually exclusive — scheduler now closes doc-library on
  open, matching memory.
- Surface the 6 [documents] hybrid-search weights in WebUI settings (advanced).
  Wired the missing backend round-trip: webui_config.c set path, config_env.c
  JSON getter + TOML writer (parser + defaults already existed).
- Sticky panel: drop outside-click auto-close (mirrors the music panel) so an
  in-panel action spawning the confirm modal no longer closes the panel. The
  top-right trio stays mutually exclusive — scheduler now closes doc-library on
  open, matching memory. Drops to z-index 997 (like music) so the Settings panel
  slides out over it.

Test: debug build clean (no warnings); format --check --changed clean; the 3 JS
files pass node --check; test_config_validate 18/18.
Add `dawn-admin memory rebuild-document-fts` — rebuilds document_chunks_fts
from scratch: recovery path for a partial v61 migration backfill or for FTS
orphans left by delete_indexed's OOM plain-delete fallback.

document_db_rebuild_fts() clears the contentless index ('delete-all', so no
original stems needed) then keyset-re-indexes every live chunk one per
lock-cycle (stem outside the leaf lock, reuse document_db_chunk_index_fts).
Global op (the FTS index spans users); admin opcode 0x92, no payload.

Test: 2 new cases in test_document_search_bm25 (from-scratch reindex +
orphan-scoping); 77/77 CI; build + format clean. Live-verified end-to-end:
wiped the FTS index to 0, rebuild restored all 285 chunks (~0.55s).
…uidance (B2)

document_manage gains 'edit' (find/replace) and 'append' so the LLM updates a
note by sending only a diff instead of resaving the whole text (the cause of the
earlier truncation incident). edit carries {find,replace} as a JSON-object tail
param (closes the ::-in-find silent-corruption vector); unique-match contract
(0/>1 occurrences refuse, never guess); in-place via document_note_update
(stable doc_id/FTS/gloss). Pure splice logic split into document_edit_ops.c +
unit-tested (13 cases).

Extraction guard extended to redact edit/append content from history (collector
+ structural redactor) so dictated note text isn't re-mined into memory — live-
verified: 4 bodies redacted, zero content leaked to memory_facts. B2: one-home-
per-datum guidance in document_manage + memory remember descriptions. Also fixes
a pre-existing param_count off-by-one in the tool metadata.

Test: 78/78 CI; live-verified edit/append/not-found/B2 end-to-end on Sonnet 4.6.
…diting (B1b, v63)

B3 versioning: every destructive change (overwrite/edit/append/delete) snapshots
the prior content to document_versions first, atomic with the mutation. Bounded
by age (version_retention_days, default 14, swept in auth_maintenance) and a
per-doc cap (version_keep_per_doc, default 10) — both config. Surfaces: WebUI
note/doc viewer "History" + Restore, a "Recently deleted" list, and the LLM
document_manage actions list_deleted + recover. `recover <label>` is a full UNDO
— restores an existing item's previous version in place (toggles undo/redo), or
re-creates a deleted one from its snapshot.

B1b multi-chunk editing: document_full_text stores canonical text on ingest (not
notes, not globals); edit/append now accept multi-chunk docs and document_doc_-
update re-chunks + re-embeds outside the auth_db leaf lock, then swaps all chunks
+ FTS + full_text in one transaction (doc_id stable). Pre-v63 uploads prompt a
re-save; no lossy backfill.

Schema v61→v63 (two idempotent gate-flagged migrations). Big-three reviewed (0
critical; per-doc cap→config, full_text skipped for globals, id guards, owner
scoping). Versioning surfaced in tool descriptions; save_text steering sharpened.
Test: 78/78 CI (10 new doc/version cases incl. the destructive swap + undo
round-trip). Live-verified end-to-end on real data incl. multi-chunk doc undo.
Holistic master-code-review of the notes_reference_store branch:

- XSS: escape quotes in title/aria-label attrs (LLM/cross-user labels)
- delete_indexed: ROLLBACK on failed delete (no orphan FTS/version rows)
- save_text overwrite goes in place (stable doc_id → undo/version chain)
- WebUI deleted-restore: document_index_text fallback for multi-chunk
- recover re-points version rows onto the new doc (no deleted-ghost/dup)
- guard: NULL-check JSON-null "type" (extraction-thread crash)
- skip empty/junk version snapshot; correct num_chunks on partial embed
- note_extraction_guard: wire missing config round-trip (WebUI save dropped it)
- config parity (version_* example, guard in settings schema)

Tests: +2 cases (version_reattach, set_num_chunks); 20/20 + 78/78 CI green.
@qodo-code-review

qodo-code-review Bot commented Jun 12, 2026

Copy link
Copy Markdown

Code Review by Qodo

🐞 Bugs (0) 📘 Rule violations (2)

Context used
✅ Compliance rules (platform): 27 rules

Grey Divider


Action required

1. docmgmt_find_replace_once() returns 1 on OOM 📘 Rule violation ≡ Correctness
Description
docmgmt_find_replace_once() returns the magic value 1 for an out-of-memory condition, which is a
specific error case but uses a value reserved for generic failure and is not a named error code > 1.
This makes error handling ambiguous and violates the error-code standard.
Code

src/tools/document_edit_ops.c[R43-50]

+   char *n = malloc(strlen(src) - find_len + rep_len + 1);
+   if (!n)
+      return 1; /* unique match but OOM — *out stays NULL */
+   memcpy(n, src, prefix);
+   memcpy(n + prefix, replace, rep_len);
+   strcpy(n + prefix + rep_len, first + find_len);
+   *out = n;
+   return 1;
Evidence
PR Compliance ID 278941 requires specific error conditions to have their own named constants with
integer values > 1 and flags magic 0/1 values used for detailed error cases. The added code
explicitly returns 1 to signal OOM, which is a specific failure mode.

Rule 278941: Define specific error codes with integer values greater than 1
src/tools/document_edit_ops.c[43-50]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`docmgmt_find_replace_once()` uses raw integer return values and returns `1` for the specific error case "OOM" (`return 1; /* unique match but OOM ... */`). Specific error conditions must use dedicated named constants with integer values > 1.

## Issue Context
Callers interpret the return as a small status domain; the OOM case should be distinguishable without overloading `1`.

## Fix Focus Areas
- src/tools/document_edit_ops.c[31-50]
- src/tools/document_manage_tool.c[492-507]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. auth_db_migrations.c exceeds 2,500 lines 📘 Rule violation ⌂ Architecture
Description
src/auth/auth_db_migrations.c is introduced/expanded to 2,695 lines while adding new migration
behavior, exceeding the 2,500-line limit without further splitting. This increases maintenance risk
and violates the file-size discipline requirement for new feature work.
Code

src/auth/auth_db_migrations.c[R29-34]

+ * FILE SIZE (deliberate): this ladder exceeds the 2,500-line hard limit by
+ * design and grows monotonically — it is an append-only historical record (one
+ * block per shipped schema version, never edited once shipped), so it reads as a
+ * changelog rather than active logic.  When it next needs cutting, group the
+ * blocks into era helpers (e.g. apply_v1_v40() / apply_v41_vN()) called from
+ * auth_db_apply_migrations(); do NOT reflow or merge the historical blocks.
Evidence
PR Compliance ID 278932 requires splitting .c files that exceed 2,500 lines when adding new
behavior. The diff shows this file is expanded to 2,695 lines and even documents that it exceeds the
hard limit, indicating it was not split further despite crossing the threshold.

Rule 278932: Split oversized C source files before adding new features
src/auth/auth_db_migrations.c[29-34]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`src/auth/auth_db_migrations.c` is now ~2,695 lines and continues to grow, which violates the requirement to split oversized `.c` files (>2,500 lines) before adding new behavior.

## Issue Context
The PR adds/extends the auth DB migration ladder and explicitly notes the file exceeds 2,500 lines.

## Fix Focus Areas
- src/auth/auth_db_migrations.c[1-2695]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. document_db_set_num_chunks() missing @param ✓ Resolved 📘 Rule violation ✧ Quality
Description
The new public API declaration document_db_set_num_chunks(int64_t doc_id, int num_chunks) has a
Doxygen block but omits @param entries for both parameters. This violates the requirement that
public API documentation enumerate all parameters.
Code

include/tools/document_db.h[R131-135]

+/**
+ * @brief Correct a document's stored chunk count (e.g. after embed failures).
+ * @return SUCCESS (0) / FAILURE (1 = invalid args / DB error).
+ */
+int document_db_set_num_chunks(int64_t doc_id, int num_chunks);
Evidence
PR Compliance ID 278940 requires @param entries for each parameter on public API declarations. The
added comment for document_db_set_num_chunks() includes only @brief and @return, with no
parameter documentation.

Rule 278940: Require Doxygen-style comments for all public API functions
include/tools/document_db.h[131-135]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The Doxygen comment for `document_db_set_num_chunks()` is missing `@param` documentation for `doc_id` and `num_chunks`.

## Issue Context
This is a public function declared in a public header (`include/tools/document_db.h`).

## Fix Focus Areas
- include/tools/document_db.h[131-135]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (2)
4. document_manage.h Doxygen incomplete ✓ Resolved 📘 Rule violation ✧ Quality
Description
Public API declarations in include/tools/document_manage.h lack required Doxygen fields:
document_manage_tool_register() has no @return, and docmgmt_append_text() has no @param
entries for its parameters. This breaks the requirement for Doxygen-complete public headers.
Code

include/tools/document_manage.h[R30-55]

+/** Register the document_manage tool with the tool registry. */
+int document_manage_tool_register(void);
+
+/* Pure text-edit helpers (no DB / embedding deps) — exposed for unit testing the
+ * find/replace + append contract independent of the storage layer. */
+
+/**
+ * @brief Replace the single occurrence of `find` in `src` with `replace`.
+ *
+ * Unique-match contract (mirrors str_replace): the edit only applies when `find`
+ * occurs EXACTLY once.  `replace` may be "" (a deletion).
+ *
+ * @param src     Source text (NUL-terminated).
+ * @param find    Exact substring to locate (must be non-empty).
+ * @param replace Replacement text ("" deletes the match).
+ * @param out     On a unique match, set to a heap result (caller frees); else NULL.
+ * @return Occurrence count clamped to {0, 1, 2} (2 = "two or more"); on a unique
+ *         match (1), *out is the result, or NULL on allocation failure.
+ */
+int docmgmt_find_replace_once(const char *src, const char *find, const char *replace, char **out);
+
+/**
+ * @brief Append `text` to `src`, newline-joined if `src` doesn't already end in '\n'.
+ * @return Heap result (caller frees), or NULL on allocation failure.
+ */
+char *docmgmt_append_text(const char *src, const char *text);
Evidence
PR Compliance ID 278940 requires a Doxygen-style comment block with @param for each parameter and
@return for non-void functions. The added header shows document_manage_tool_register() only has
a short sentence comment and docmgmt_append_text() lacks @param tags.

Rule 278940: Require Doxygen-style comments for all public API functions
include/tools/document_manage.h[30-55]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Public API functions in headers must have Doxygen blocks that include `@param` entries for each parameter and `@return` for non-void return types. The new declarations in `include/tools/document_manage.h` are missing these required tags.

## Issue Context
These functions are exported via a public header under `include/tools/`.

## Fix Focus Areas
- include/tools/document_manage.h[30-55]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Migration lacks rollback ✓ Resolved 🐞 Bug ☼ Reliability
Description
In the v61 migration FTS backfill, the code begins a transaction and attempts COMMIT, but on COMMIT
failure it never issues a ROLLBACK, potentially leaving the SQLite connection in an open transaction
and affecting subsequent schema/version updates and runtime DB operations. This can manifest as a
stuck write lock or later statements running in an unintended transaction state during startup.
Code

src/auth/auth_db_migrations.c[R2397-2448]

+            (void)sqlite3_exec(s_db.db, "BEGIN IMMEDIATE", NULL, NULL, NULL);
+            while (sqlite3_step(select_stmt) == SQLITE_ROW) {
+               int64_t cid = sqlite3_column_int64(select_stmt, 0);
+               const unsigned char *body = sqlite3_column_text(select_stmt, 1);
+               const unsigned char *label = sqlite3_column_text(select_stmt, 2);
+               char label_stems[MEMORY_FACT_STEMS_MAX];
+               char body_stems[MEMORY_FACT_STEMS_MAX];
+               (void)memory_stem_string((const char *)(label ? label : (const unsigned char *)""),
+                                        label_stems, sizeof(label_stems));
+               (void)memory_stem_string((const char *)(body ? body : (const unsigned char *)""),
+                                        body_stems, sizeof(body_stems));
+               sqlite3_reset(insert_stmt);
+               sqlite3_bind_int64(insert_stmt, 1, cid);
+               sqlite3_bind_text(insert_stmt, 2, label_stems, -1, SQLITE_TRANSIENT);
+               sqlite3_bind_text(insert_stmt, 3, body_stems, -1, SQLITE_TRANSIENT);
+               if (sqlite3_step(insert_stmt) != SQLITE_DONE) {
+                  backfill_errors++;
+               } else {
+                  backfill_count++;
+               }
+            }
+            commit_rc = sqlite3_exec(s_db.db, "COMMIT", NULL, NULL, NULL);
+         }
+         if (select_stmt)
+            sqlite3_finalize(select_stmt);
+         if (insert_stmt)
+            sqlite3_finalize(insert_stmt);
+         if (backfill_errors > 0) {
+            OLOG_WARNING("auth_db: v61 document FTS backfill PARTIAL: %d indexed, %d failed — "
+                         "some chunks won't surface via lexical search until a manual rebuild",
+                         backfill_count, backfill_errors);
+         } else {
+            OLOG_INFO("auth_db: v61 document FTS backfill complete: %d chunks indexed",
+                      backfill_count);
+         }
+         if (prep_rc == SQLITE_OK && commit_rc == SQLITE_OK) {
+            fts_ok = true;
+         } else {
+            OLOG_ERROR("auth_db: v61 migration COMMIT or statement prep failed — "
+                       "leaving schema_version at %d so next boot retries",
+                       current_version);
+         }
+      }
+
+      v61_ok = alter_ok && fts_ok;
+   }
+
+   /* v62 migration: document_versions — soft-archive of pre-mutation content for
+    * undo/restore.  Plain CREATE (idempotent), no backfill (snapshots accrue
+    * going forward).  Fresh installs get it from SCHEMA_SQL.  No FK to documents
+    * by design (a version must outlive a deleted document). */
+   bool v62_ok = (current_version >= 62) || (current_version == 0);
Evidence
The migration starts BEGIN IMMEDIATE and always attempts COMMIT, but the failure path only logs
and sets fts_ok false—there is no rollback/cleanup, and the function later continues to schema
bump logic without any transaction reset.

src/auth/auth_db_migrations.c[2377-2438]
src/auth/auth_db_migrations.c[2655-2694]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The v61 migration backfill opens a transaction (`BEGIN IMMEDIATE`) and then calls `COMMIT`. If `COMMIT` fails (e.g., SQLITE_BUSY / IO error), the code logs and proceeds without an explicit `ROLLBACK`, which can leave the connection in an active transaction and hold locks / taint subsequent startup SQL.

## Issue Context
This code runs during `auth_db_init()` startup, before statement preparation. A transactional failure here should leave the DB connection in a clean state (autocommit restored) even if the migration is held for retry.

## Fix Focus Areas
- src/auth/auth_db_migrations.c[2385-2438]
- src/auth/auth_db_migrations.c[2655-2694]

### Concrete fix sketch
- Capture the return code of `BEGIN IMMEDIATE` and only attempt `COMMIT`/`ROLLBACK` if `BEGIN` succeeded.
- If `commit_rc != SQLITE_OK`, execute `sqlite3_exec(..., "ROLLBACK", ...)` (best-effort) before continuing.
- Consider setting `fts_ok = false` on any `backfill_errors > 0` only if you want to make the schema bump strictly atomic; otherwise keep the current partial-advance policy but still rollback on commit failure.
- Ensure any later schema_version updates are not executed while still inside the failed transaction.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

6. show_all ignored on query ✓ Resolved 🐞 Bug ≡ Correctness
Description
handle_doc_library_list() ignores the admin show_all flag whenever payload.query is set,
because the BM25 search path always queries using conn->auth_user_id and never uses
document_db_list_all. If show_all is true during a query, the response can also include
incorrect user_id/owner_name fields because docs[] entries built from BM25 hits don’t populate
those fields but the JSON response adds them anyway when show_all is set.
Code

src/webui/webui_doc_library.c[R120-163]

+   /* v61 search path: label/body BM25 (lexical) over the user's docs, deduped to
+    * one row per document, then scope-filtered.  Single page (no offset). */
+   if (query) {
+      int max_hits = DOC_LIBRARY_DEFAULT_LIMIT * DOC_LIBRARY_BM25_CAND_FACTOR;
+      doc_bm25_hit_t *hits = calloc((size_t)max_hits, sizeof(doc_bm25_hit_t));
+      float *scores = calloc((size_t)max_hits, sizeof(float));
+      int hit_count = 0;
+      if (hits && scores &&
+          document_db_chunk_search_bm25(conn->auth_user_id, query,
+                                        g_config.documents.fts_label_weight,
+                                        g_config.documents.fts_body_weight, hits, scores, max_hits,
+                                        &hit_count) == SUCCESS) {
+         for (int i = 0; i < hit_count && count < limit; i++) {
+            bool is_note = (strcmp(hits[i].filetype, "note") == 0);
+            if ((kind == DOC_KIND_DOCS && is_note) || (kind == DOC_KIND_NOTES && !is_note))
+               continue;
+            bool dup = false; /* dedup to one row per document */
+            for (int j = 0; j < count; j++)
+               if (docs[j].id == hits[i].document_id) {
+                  dup = true;
+                  break;
+               }
+            if (dup)
+               continue;
+            memset(&docs[count], 0, sizeof(docs[count]));
+            docs[count].id = hits[i].document_id;
+            snprintf(docs[count].filename, sizeof(docs[count].filename), "%s", hits[i].filename);
+            snprintf(docs[count].filetype, sizeof(docs[count].filetype), "%s", hits[i].filetype);
+            docs[count].num_chunks = hits[i].num_chunks;
+            docs[count].created_at = hits[i].created_at;
+            count++;
+         }
+      } else {
+         ok = false;
+      }
+      free(hits);
+      free(scores);
+   } else {
+      int list_rc = show_all ? document_db_list_all(docs, limit, offset, &count)
+                             : document_db_list_filtered(conn->auth_user_id, kind, docs, limit,
+                                                         offset, &count);
+      ok = (list_rc == SUCCESS);
+      has_more = (count == limit);
+   }
Evidence
Server BM25 query path always searches as the authenticated user and never applies show_all, while
the client still sends show_all during searches; the server later conditionally emits owner fields
based on show_all, which can be incorrect in the BM25 branch.

src/webui/webui_doc_library.c[74-163]
src/webui/webui_doc_library.c[188-200]
www/js/ui/doc-library.js[382-388]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The WebUI list handler has two modes: query (BM25) and non-query (paged list). The query mode hardcodes `conn->auth_user_id` and does not implement admin `show_all`, yet the client sends `show_all` even during searches. Additionally, when `show_all` is true, the server emits `user_id/owner_name` from `docs[]` that are never filled in the BM25 branch, producing wrong metadata.

## Issue Context
Client always includes `show_all` when toggled, even when `query` is set. Backend should either:
1) fully support `show_all` search, or
2) explicitly disable/ignore `show_all` during query *and* avoid emitting owner fields.

## Fix Focus Areas
- src/webui/webui_doc_library.c[74-200]
- www/js/ui/doc-library.js[382-388]

### Concrete fix options
- Minimal correctness fix: if `query != NULL`, force `show_all = false` (or return an explicit error), and do not add owner fields in the response.
- Full feature fix: add an admin-capable BM25 search variant (new SQL/statement + API) that searches across all docs and returns `d.user_id` (and optionally username) so `docs[]` and the JSON response can populate correct owner metadata when `show_all` is enabled.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. free(notes) not nullified ✓ Resolved 📘 Rule violation ≡ Correctness
Description
notes is freed without being immediately set to NULL in handle_memory_backfill_note_glosses().
This violates the project requirement to nullify freed pointers to reduce use-after-free risk during
later edits or refactors.
Code

src/auth/admin_socket_memory.c[571]

+   free(notes);
Evidence
PR Compliance ID 278935 requires nullifying pointers immediately after free() for
modified/introduced pointer variables. The new code frees notes and continues execution without
setting it to NULL.

Rule 278935: Nullify pointers immediately after free
src/auth/admin_socket_memory.c[571-572]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
After `free(notes);`, the pointer should be set to `NULL` immediately per project memory-safety rules.

## Issue Context
While the pointer is not currently used after the free, the coding standard requires immediate nullification to prevent accidental future use.

## Fix Focus Areas
- src/auth/admin_socket_memory.c[571-572]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a deterministic notes / reference-text store on top of the existing document/RAG system so users can retrieve exactly what they saved under a label, backed by hybrid BM25 lexical + embedding search, in-place editing, and version history/restore. This also introduces a memory↔note bridge (gloss facts pointing to notes) and a note-extraction guard to prevent verbatim filed reference text from being re-mined into semantic memory.

Changes:

  • Implement hybrid document search (BM25 candidate set + semantic fusion + phrase bonus) and expose tuning controls in config + WebUI.
  • Add note save/update + document/note version history (undo/recover) with admin recovery commands (FTS rebuild, gloss backfill).
  • Add Phase 9 memory integration: gloss facts linked by note_doc_id, extraction-time redaction guard, and improved duplicate-tool-call scoping + tool-arg truncation safety.

Reviewed changes

Copilot reviewed 66 out of 68 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
www/js/ui/settings/schema.js Adds UI schema fields for note-extraction guard and hybrid-search/versioning tuning.
www/js/ui/scheduler-queue.js Closes doc-library panel when scheduler popover opens to avoid slot conflicts.
www/js/dawn.js Adds WebSocket message handlers for note save/update, version list/restore, deleted list.
www/index.html Extends Document Library UI with search, tabs (Documents/Notes), note viewer/editor, deleted list toggle.
www/css/components/doc-library.css Styles notes UI, adds scoped .hidden toggles, adjusts z-index layering.
www/css/base/components.css Introduces shared .dawn-input styling and warning hint color.
tests/test_memory_provenance.c Updates test DDL for note_doc_id bridge column.
tests/test_memory_note_guard.c New tests for extraction redaction of filed note bodies (both provider shapes).
tests/test_memory_note_bridge.c New lifecycle tests for memory→note gloss creation/rename/delete and owner scoping.
tests/test_llm_dup_check.c New tests for turn-scoped duplicate-tool-call detection (OpenAI + Claude).
tests/test_llm_dup_check_stub.c Link-time stubs to isolate llm_tools duplicate-check logic for testing.
tests/test_document_search_bm25.c New tests asserting exact-label BM25 rank-1 behavior + versioning/full-text edit flows.
tests/test_document_manage.c New tests for document_manage edit/append pure string ops contract.
tests/test_document_db_stub.c Adds g_config stub for document_db versioning-dependent paths in tests.
tests/CMakeLists.txt Wires new unit tests and adds required sources for BM25/stemming paths.
src/webui/webui_message_dispatch.c Dispatches new doc-library note/version/deleted message types.
src/webui/webui_doc_library.c Adds scoped list/search, note save/update endpoints, version list/restore, deleted list; uses delete_indexed and gloss deletion.
src/webui/webui_config.c Parses/round-trips new memory/documents config keys from WebUI config JSON.
src/tools/tools_init.c Registers the new document_manage tool.
src/tools/memory_tool.c Adds memory tool action get and guidance to use notes for verbatim reference text.
src/tools/document_search.c Reworks document_search into hybrid lexical+semantic with phrase bonus and configurable thresholds.
src/tools/document_read.c Adds optional id parameter for deterministic read targeting; makes document optional.
src/tools/document_index_pipeline.c Adds FTS indexing on ingest, note save/update, partial-embed num_chunks correction, full-text storage, and multi-chunk doc replace pipeline.
src/tools/document_edit_ops.c New isolated translation unit for edit/append pure string operations.
src/memory/memory_note_bridge.c New bridge to create/delete “gloss” facts linked to notes via note_doc_id.
src/memory/memory_extraction.c Integrates note-extraction guard redaction into extraction input assembly.
src/memory/memory_embeddings.c Extends embedding cache with note_doc_id and excludes glosses from dedup clustering/nearest-fact merge targets.
src/memory/memory_db.c Exempts gloss facts from decay and low-confidence pruning.
src/memory/memory_db_facts.c Factors BM25 MATCH expr builder out, adds note_doc_id setters/finders, excludes glosses from pattern bulk delete, returns note_doc_id in embeddings query.
src/memory/memory_callback.c Adds memory.get action and shared numeric-ID list parser.
src/memory/memory_bm25.c Adds shared memory_bm25_build_match_expr helper for FTS5 consumers.
src/llm/llm_tools.c Blocks execution on truncated tool args; makes duplicate-tool-call detection turn-scoped.
src/llm/llm_streaming.c Tracks tool-argument overflow during streaming and marks calls as truncated.
src/config/config_parser.c Adds TOML parsing + key validation for new documents/memory settings.
src/config/config_env.c Adds JSON/TOML round-trip support for new documents/memory settings.
src/config/config_defaults.c Sets defaults for hybrid search, versioning, and note-extraction guard.
src/auth/auth_maintenance.c Adds retention pruning sweep for document/note version history.
src/auth/auth_db_statements.c Prepares new FTS/index/search statements and extends embeddings select with note_doc_id.
src/auth/admin_socket.c Adds admin opcodes for gloss backfill and document FTS rebuild.
src/auth/admin_socket_memory.c Implements gloss backfill and document FTS rebuild admin handlers.
include/webui/webui_doc_library.h Declares new doc-library WebSocket handlers.
include/tools/document_manage.h Declares document_manage tool register and edit/append helpers.
include/tools/document_index_pipeline.h Declares note index/update and multi-chunk doc update APIs.
include/tools/document_db.h Adds BM25 hit structs and versioning/full-text/edit APIs and constants.
include/memory/memory_note_guard.h Declares extraction redaction guard API.
include/memory/memory_note_bridge.h Declares memory→note bridge API.
include/memory/memory_db.h Declares note_doc_id setter/finder for bridge.
include/memory/memory_db_embeddings.h Extends embeddings retrieval API to include note_doc_id.
include/memory/memory_bm25.h Declares memory_bm25_build_match_expr.
include/llm/llm_tools.h Adds args_truncated marker to tool_call_t.
include/llm/llm_streaming.h Adds streaming overflow tracking for tool args.
include/config/dawn_config.h Adds documents hybrid-search/versioning config and memory note-extraction guard.
include/auth/auth_db_migrations.h Declares migration ladder entry point (schema split).
include/auth/auth_db_internal.h Bumps schema version and adds prepared statement slots for doc FTS + note edit.
include/auth/admin_socket.h Adds admin opcodes for gloss backfill and FTS rebuild.
include/auth/admin_socket_internal.h Declares new admin handler prototypes.
dawn.toml.example Documents new config options for hybrid search, versioning, and extraction guard.
dawn-admin/socket_client.h Adds client APIs for gloss backfill and FTS rebuild.
dawn-admin/socket_client.c Implements new admin socket client calls.
dawn-admin/main.c Adds CLI commands for gloss backfill and document FTS rebuild.
CMakeLists.txt Adds new sources (migrations, guard, bridge) to the build.
cmake/DawnTools.cmake Includes document_manage tool sources when document search tool is enabled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/webui/webui_doc_library.c Outdated
Comment thread src/webui/webui_doc_library.c
Comment thread src/auth/auth_db_migrations.c
Comment thread src/tools/document_edit_ops.c
Comment thread include/tools/document_manage.h Outdated
Comment thread include/tools/document_db.h
Comment thread src/auth/auth_db_migrations.c
@qodo-code-review

Copy link
Copy Markdown

PR Summary by Qodo

Notes / reference-text store: deterministic filing cabinet on the document/RAG layer
✨ Enhancement 🧪 Tests ⚙️ Configuration changes 🕐 40+ Minutes

Grey Divider

Walkthroughs

Description
• Adds label-keyed **Notes** as single-chunk documents, retrievable deterministically; fixes
  conv-809.
• Implements **hybrid BM25 + semantic document search** with phrase bonus via new FTS5 index.
• Adds **document_manage** tool + WebUI Notes tab with editing, version history, and recovery.
Diagram
graph TD
    User(["User / LLM"])

    subgraph Tools["LLM Tools"]
        DM["document_manage"]
        DS["document_search"]
        DR["document_read"]
        MT["memory (+get)"]
    end

    subgraph Doc["Document / RAG"]
        DIP["index pipeline"] --> DDB[("document_db")]
        DDB --> FTS[("document_chunks_fts")]
        DDB --> VER[("document_versions")]
        DDB --> FTX[("document_full_text")]
    end

    subgraph Mem["Memory"]
        BR["note bridge"] --> MF[("memory_facts\n(note_doc_id)")]
        EXT["extraction"] --> NG["note guard"]
    end

    WebUI["WebUI Doc Library"] --> DDB
    Admin["dawn-admin"] --> DDB

    User --> DM --> DIP
    User --> DS --> DDB
    User --> DR --> DDB
    DM --> BR
    EXT --> NG

    subgraph Legend
      direction LR
      _api(["API/Tool"]) ~~~ _db[("DB/Index")] ~~~ _mod["Module"]
    end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Dedicated notes table separate from documents
  • ➕ Cleaner separation of note vs document semantics
  • ➕ Avoids filetype/num_chunks invariants bleeding across code
  • ➕ Potentially simpler edit/restore flows for notes
  • ➖ Duplicates indexing/search/versioning/UI plumbing for a second entity type
  • ➖ More surfaces to harden for auth/IDOR and migrations
  • ➖ Loses benefits of reusing existing document pipeline end-to-end
2. Pure semantic retrieval with a label boost heuristic
  • ➕ No schema or FTS5 changes
  • ➕ Simpler implementation
  • ➖ Does not guarantee exact-label retrieval; conv-809 shows near-twins still bury each other
  • ➖ Still fuzzy/non-deterministic under embedding similarity drift

Recommendation: The PR’s unified approach (notes as documents + an independent BM25 candidate set fused with semantic + phrase bonus) is the right strategy for deterministic “exact label” retrieval while reusing the existing document infrastructure. A separate notes table is the only substantial alternative, but the added subsystem duplication likely outweighs the modest complexity of the current filetype/invariant checks.

Grey Divider

File Changes

Enhancement (43)
main.c Add admin subcommands for FTS rebuild and gloss backfill +58/-0

Add admin subcommands for FTS rebuild and gloss backfill

• Adds 'memory backfill-note-glosses <user>' and 'memory rebuild-document-fts' commands and usage text.

dawn-admin/main.c


socket_client.c Client support for new admin socket requests +23/-0

Client support for new admin socket requests

• Implements request/response helpers for the new memory admin commands (gloss backfill, FTS rebuild).

dawn-admin/socket_client.c


socket_client.h Declare new admin socket client APIs +30/-0

Declare new admin socket client APIs

• Adds function declarations and request enums for note gloss backfill and document FTS rebuild.

dawn-admin/socket_client.h


auth_db_internal.h Extend internal DB statement struct for document FTS + chunk update +6/-1

Extend internal DB statement struct for document FTS + chunk update

• Adds new prepared statement fields (FTS insert/delete/search, chunk update) used by document_db and hybrid search.

include/auth/auth_db_internal.h


llm_streaming.h Streaming interface expansion for new tool surfaces +2/-0

Streaming interface expansion for new tool surfaces

• Adds small API surface to support additional tool-driven behaviors introduced in this PR.

include/llm/llm_streaming.h


llm_tools.h Tool layer tweaks to support new document_manage tool +2/-0

Tool layer tweaks to support new document_manage tool

• Adds small interface changes used by document_manage for size/arg limits and tool integration.

include/llm/llm_tools.h


memory_db.h Add note_doc_id link helpers to memory DB API +25/-0

Add note_doc_id link helpers to memory DB API

• Declares 'memory_db_fact_set_note_doc_id' and 'memory_db_fact_find_by_note_doc_id'.

include/memory/memory_db.h


memory_db_embeddings.h Plumb note_doc_id through embedding DB read APIs +4/-0

Plumb note_doc_id through embedding DB read APIs

• Extends embedding-related interfaces to carry note_doc_id data alongside embeddings.

include/memory/memory_db_embeddings.h


memory_note_bridge.h New API for memory↔note gloss bridge +76/-0

New API for memory↔note gloss bridge

• Declares upsert/delete functions for gloss facts that redirect fuzzy queries to exact notes.

include/memory/memory_note_bridge.h


memory_note_guard.h New API for note-extraction guard +95/-0

New API for note-extraction guard

• Declares creation/redaction/free APIs for extraction-time redaction of filed note bodies and tool echoes.

include/memory/memory_note_guard.h


document_db.h Declare v61–v63 document DB APIs (FTS, versions, full-text, atomic replace) +281/-0

Declare v61–v63 document DB APIs (FTS, versions, full-text, atomic replace)

• Adds structs and function declarations for BM25 hits, version metadata, FTS indexing/rebuild, versioning/retention, full-text storage, and atomic doc replacement.

include/tools/document_db.h


document_index_pipeline.h Add note and in-place edit/update pipeline entry points +61/-0

Add note and in-place edit/update pipeline entry points

• Declares 'document_index_note', 'document_note_update', and 'document_doc_update' for note saves and stable-id updates.

include/tools/document_index_pipeline.h


document_manage.h New header for document_manage tool and pure edit ops +61/-0

New header for document_manage tool and pure edit ops

• Declares tool register function plus 'docmgmt_find_replace_once' and 'docmgmt_append_text' for isolated testing.

include/tools/document_manage.h


webui_doc_library.h Add WebUI handlers for notes, versions, and recovery +34/-0

Add WebUI handlers for notes, versions, and recovery

• Declares new websocket handlers for note CRUD, version history, restore, and deleted recovery.

include/webui/webui_doc_library.h


admin_socket_memory.c Implement gloss backfill and document FTS rebuild handlers +87/-0

Implement gloss backfill and document FTS rebuild handlers

• Adds idempotent backfill of gloss facts for existing notes and global rebuild of document_chunks_fts.

src/auth/admin_socket_memory.c


auth_db_schema.c Add v61–v63 base schema objects and delegate migrations +64/-2445

Add v61–v63 base schema objects and delegate migrations

• Adds 'document_chunks_fts', 'document_versions', 'document_full_text', and 'memory_facts.note_doc_id' to SCHEMA_SQL; removes inline migration ladder.

src/auth/auth_db_schema.c


auth_db_statements.c Prepare statements for document FTS and gloss-safe memory operations +87/-8

Prepare statements for document FTS and gloss-safe memory operations

• Adds prepared statements for document_chunks_fts insert/delete/search and chunk update; excludes gloss facts from like-match and pruning; selects note_doc_id in embeddings query.

src/auth/auth_db_statements.c


auth_maintenance.c Prune document version history by retention +14/-0

Prune document version history by retention

• Adds periodic sweep to delete expired document_versions rows based on config retention days.

src/auth/auth_maintenance.c


llm_streaming.c Streaming behavior adjustments for tool-driven note/document flows +14/-0

Streaming behavior adjustments for tool-driven note/document flows

• Adds small streaming-layer changes to support expanded tool usage patterns introduced here.

src/llm/llm_streaming.c


llm_tools.c Tool executor updates for document_manage integration +69/-1

Tool executor updates for document_manage integration

• Adds minimal changes to tool execution plumbing to support document_manage behaviors and argument limits.

src/llm/llm_tools.c


memory_callback.c Add deterministic memory get-by-id action +102/-14

Add deterministic memory get-by-id action

• Adds 'get' action with shared ID list parsing (also used by forget) for exact retrieval of specific facts.

src/memory/memory_callback.c


memory_db.c Memory DB adjustments for note_doc_id surfaces +8/-2

Memory DB adjustments for note_doc_id surfaces

• Minor updates to support note_doc_id plumbing and new bridge-related queries.

src/memory/memory_db.c


memory_db_facts.c Add note_doc_id link and gloss exclusions in bulk operations +89/-68

Add note_doc_id link and gloss exclusions in bulk operations

• Adds set/find helpers for note_doc_id, excludes gloss facts from pattern deletes, and switches BM25 match builder to shared helper.

src/memory/memory_db_facts.c


memory_embeddings.c Track note_doc_id in embedding cache and exclude glosses from dedup workflows +44/-6

Track note_doc_id in embedding cache and exclude glosses from dedup workflows

• Adds note_doc_ids array to cache; skips glosses in nearest_fact and duplicate clustering to prevent merges/pruning from breaking the bridge.

src/memory/memory_embeddings.c


memory_extraction.c Apply note-extraction guard redaction to extraction input +18/-2

Apply note-extraction guard redaction to extraction input

• Creates a note guard from full history and redacts filed note bodies and tool echoes on the extracted message copies.

src/memory/memory_extraction.c


memory_note_bridge.c Implement memory↔note gloss fact lifecycle +143/-0

Implement memory↔note gloss fact lifecycle

• Creates/refreshes/deletes per-note gloss facts linked via note_doc_id; sanitizes labels and gates via injection filter.

src/memory/memory_note_bridge.c


memory_note_guard.c Implement extraction-time redaction of filed note bodies and tool echoes +780/-0

Implement extraction-time redaction of filed note bodies and tool echoes

• Collects filed note bodies from document_manage calls and redacts them (plus document_read/search echoes) from extraction prompts across provider message shapes.

src/memory/memory_note_guard.c


document_db.c Add document FTS sync, indexed delete, versioning, full-text, and atomic replace +1171/-0

Add document FTS sync, indexed delete, versioning, full-text, and atomic replace

• Implements contentless FTS insert/delete, orphan-free indexed deletes, admin FTS rebuild, version history storage/retention, full-text storage, and multi-chunk atomic replace logic.

src/tools/document_db.c


document_edit_ops.c New pure text edit helpers for document_manage edit/append +73/-0

New pure text edit helpers for document_manage edit/append

• Adds find/replace-once (unique-match) and newline-aware append helpers decoupled from DB for unit tests.

src/tools/document_edit_ops.c


document_index_pipeline.c Index BM25 stems during ingest; add note save/update and doc in-place update +325/-0

Index BM25 stems during ingest; add note save/update and doc in-place update

• Stems filename/body for FTS indexing during chunk ingest; adds single-chunk note save path and stable-id update APIs; stores canonical full text for multi-chunk edits.

src/tools/document_index_pipeline.c


document_manage_tool.c New document_manage LLM tool with save/edit/append/versioning/delete confirm flow +818/-0

New document_manage LLM tool with save/edit/append/versioning/delete confirm flow

• Introduces the LLM-facing write/delete API for documents and notes, including two-step deletion confirmation and concurrency guards.

src/tools/document_manage_tool.c


document_read.c Allow document_read by numeric id (owner-scoped) +44/-12

Allow document_read by numeric id (owner-scoped)

• Adds optional 'id' parameter for exact targeting; makes 'document' optional when id is present and enforces owner/global gating.

src/tools/document_read.c


document_search.c Hybrid search: fuse independent BM25 candidate set with cosine + phrase bonus +220/-112

Hybrid search: fuse independent BM25 candidate set with cosine + phrase bonus

• Adds BM25 lexical candidate retrieval and fusion with semantic candidates; computes ordered/contiguous phrase bonus and uses config-driven weights and thresholds.

src/tools/document_search.c


memory_tool.c Add memory tool get action and guidance to use notes for reference text +14/-3

Add memory tool get action and guidance to use notes for reference text

• Adds 'get' action to enum/description and updates guidance to store verbatim reference text as notes via document_manage instead of memory facts.

src/tools/memory_tool.c


tools_init.c Register document_manage tool +4/-0

Register document_manage tool

• Adds document_manage tool registration alongside document_index/read/search.

src/tools/tools_init.c


webui_doc_library.c Add Notes scope, BM25 search listing, inline note preview, and indexed delete +385/-11

Add Notes scope, BM25 search listing, inline note preview, and indexed delete

• Extends document library websocket handlers for documents vs notes scopes, BM25 search over document_chunks_fts, note body preview, version/history/recover flows, and FTS-clean deletes with bridge gloss cleanup.

src/webui/webui_doc_library.c


webui_message_dispatch.c Dispatch new doc library message types +18/-0

Dispatch new doc library message types

• Routes additional websocket message types used by Notes/version/recovery UI features.

src/webui/webui_message_dispatch.c


components.css Add base styles used by note editor/error hints +29/-0

Add base styles used by note editor/error hints

• Adds shared '.form-hint' and '.form-error' styles used by the note editor UI.

www/css/base/components.css


doc-library.css Style notes tab, search, editor/viewer, history, and recovery panels +258/-2

Style notes tab, search, editor/viewer, history, and recovery panels

• Adds significant CSS for new doc library UI elements: tabs, search bar, note editor/viewer, versions, and recently deleted list.

www/css/components/doc-library.css


index.html Add Documents/Notes tabs, search box, note viewer/editor, and recovery UI +115/-1

Add Documents/Notes tabs, search box, note viewer/editor, and recovery UI

• Adds markup for cross-scope search, WAI-ARIA tabs, note create/edit UI, read-only viewer, version history panel, and recently deleted toggle.

www/index.html


dawn.js WebUI wiring for new document library interactions +20/-0

WebUI wiring for new document library interactions

• Adds small client wiring changes to support new websocket message types and UI triggers for notes features.

www/js/dawn.js


doc-library.js Implement Notes UX: tab state, debounced search, editor/viewer, history, recover +624/-30

Implement Notes UX: tab state, debounced search, editor/viewer, history, recover

• Implements notes tab behavior including localStorage persistence, debounced search, create/edit flows, read-only viewer, version list/restore, and deleted recovery interactions.

www/js/ui/doc-library.js


scheduler-queue.js Minor queue support for additional UI async operations +3/-0

Minor queue support for additional UI async operations

• Adds small scheduler enhancements used by the expanded doc library UI operations.

www/js/ui/scheduler-queue.js


Bug fix (2)
config_env.c Round-trip new config fields to JSON and TOML +29/-0

Round-trip new config fields to JSON and TOML

• Adds new fields to config JSON serialization and TOML writer so WebUI saves persist hybrid search and extraction guard settings.

src/config/config_env.c


webui_config.c Persist note_extraction_guard and hybrid-search/versioning settings via WebUI +21/-0

Persist note_extraction_guard and hybrid-search/versioning settings via WebUI

• Fixes WebUI config save to round-trip 'note_extraction_guard' and document hybrid search/versioning weights into g_config.

src/webui/webui_config.c


Refactor (4)
auth_db_migrations.h New header for migration ladder entry point +47/-0

New header for migration ladder entry point

• Introduces 'auth_db_apply_migrations()' interface extracted from auth_db_schema.c.

include/auth/auth_db_migrations.h


memory_bm25.h New shared BM25 match-expression builder API +21/-0

New shared BM25 match-expression builder API

• Declares 'memory_bm25_build_match_expr()' for reuse by document BM25 search.

include/memory/memory_bm25.h


auth_db_migrations.c New file: migration ladder extracted from auth_db_schema.c +2694/-0

New file: migration ladder extracted from auth_db_schema.c

• Moves per-version schema migrations into a dedicated file and adds v61–v63 migration steps.

src/auth/auth_db_migrations.c


memory_bm25.c Shared BM25 MATCH expression builder implementation +59/-0

Shared BM25 MATCH expression builder implementation

• Introduces reusable OR-of-quoted-stems MATCH builder extracted from memory_db_facts for multi-consumer FTS usage.

src/memory/memory_bm25.c


Documentation (1)
dawn.toml.example Document hybrid search, versioning, and extraction-guard config knobs +31/-0

Document hybrid search, versioning, and extraction-guard config knobs

• Adds commented config sections for v61 hybrid search weights, v62 version retention, and memory note-extraction guard.

dawn.toml.example


Other (18)
CMakeLists.txt Add new source files to the build +3/-0

Add new source files to the build

• Adds new modules (auth_db_migrations, document_manage, memory_note_bridge/guard, memory_bm25) to the build.

CMakeLists.txt


DawnTools.cmake Adjust tool build wiring for new components +1/-1

Adjust tool build wiring for new components

• Minor cmake wiring updates to include the new tool/module sources in builds.

cmake/DawnTools.cmake


admin_socket.h Expose new admin socket message types +9/-1

Expose new admin socket message types

• Adds new admin socket command IDs for note gloss backfill and document FTS rebuild.

include/auth/admin_socket.h


admin_socket_internal.h Internal admin socket additions for new handlers +2/-0

Internal admin socket additions for new handlers

• Adds internal handler prototypes for the new memory admin commands.

include/auth/admin_socket_internal.h


dawn_config.h Add hybrid-search weights, version retention, and note_extraction_guard +30/-0

Add hybrid-search weights, version retention, and note_extraction_guard

• Extends config structs for v61 hybrid search tuning, v62 versioning retention, and Phase 9 note extraction guard.

include/config/dawn_config.h


admin_socket.c Wire admin socket routing for new memory/document maintenance commands +8/-0

Wire admin socket routing for new memory/document maintenance commands

• Adds routing entries for new admin operations (FTS rebuild, gloss backfill).

src/auth/admin_socket.c


config_defaults.c Set defaults for hybrid search, versioning, and extraction guard +13/-0

Set defaults for hybrid search, versioning, and extraction guard

• Adds default weights for BM25/cosine/phrase bonus and version history retention plus note_extraction_guard default true.

src/config/config_defaults.c


config_parser.c Parse new document search/versioning and memory guard settings +27/-2

Parse new document search/versioning and memory guard settings

• Parses and clamps v61/v62 document config keys and 'memory.note_extraction_guard'.

src/config/config_parser.c


CMakeLists.txt Add new unit test targets for notes store and hybrid search +63/-2

Add new unit test targets for notes store and hybrid search

• Registers new tests for BM25 search, document_manage edit ops, note bridge, note guard, and LLM duplicate checking stubs.

tests/CMakeLists.txt


test_document_db_stub.c Test stub support for document DB changes +6/-0

Test stub support for document DB changes

• Adds minimal stubbing hooks needed by new tests interacting with document DB interfaces.

tests/test_document_db_stub.c


test_document_manage.c Unit tests for find/replace-once and append contract +164/-0

Unit tests for find/replace-once and append contract

• Adds isolated tests verifying unique-match find/replace behavior and newline-aware append semantics.

tests/test_document_manage.c


test_document_search_bm25.c Search quality regression for conv-809: exact-label ranks #1 +538/-0

Search quality regression for conv-809: exact-label ranks #1

• Seeds near-twin note labels/bodies and asserts BM25 label weighting returns exact-label note rank-1 for label queries.

tests/test_document_search_bm25.c


test_llm_dup_check.c Add duplicate-check tests for updated LLM/tool behaviors +194/-0

Add duplicate-check tests for updated LLM/tool behaviors

• Introduces tests covering duplicate checking functionality that interacts with expanded tool/message flows.

tests/test_llm_dup_check.c


test_llm_dup_check_stub.c Provide stubs for duplicate-check tests +114/-0

Provide stubs for duplicate-check tests

• Adds stubs to isolate duplicate-check tests from external dependencies.

tests/test_llm_dup_check_stub.c


test_memory_note_bridge.c Lifecycle tests for memory→note gloss bridge +221/-0

Lifecycle tests for memory→note gloss bridge

• Tests gloss create/link/delete/rename behavior and memory-disabled no-op behavior using in-memory DB and embedding stubs.

tests/test_memory_note_bridge.c


test_memory_note_guard.c Tests for note-extraction guard redaction across provider shapes +456/-0

Tests for note-extraction guard redaction across provider shapes

• Covers OpenAI/Claude tool-call shapes, redaction of filed bodies and tool echoes, whitespace-normalized matching, short-body floor, and config pass-through.

tests/test_memory_note_guard.c


test_memory_provenance.c Minor provenance test adjustment for new memory fields +1/-0

Minor provenance test adjustment for new memory fields

• Adds/adjusts a small assertion for memory provenance with new note_doc_id-enabled schema.

tests/test_memory_provenance.c


schema.js Expose hybrid search weights, versioning, and note extraction guard in settings UI +76/-0

Expose hybrid search weights, versioning, and note extraction guard in settings UI

• Adds advanced settings schema entries for BM25/hybrid/phrase weights, relevance threshold, version retention, and note_extraction_guard.

www/js/ui/settings/schema.js


Grey Divider

Qodo Logo

- webui doc search: pull authoritative metadata (is_global, created_at) from
  the document row instead of the BM25 hit; force show_all off on a query so
  the user-scoped search can't emit bogus owner fields
- migrations: ROLLBACK on COMMIT failure in the v61 FTS backfill (+ the
  identical pre-existing v48 site) so a failed commit can't leave an open
  transaction during startup
- admin gloss backfill: NULL the freed notes buffer
- doxygen: complete @param/@return on document_manage.h + document_db.h

Skipped (documented intentional): find_replace_once {0,1,2} count contract
(OOM via *out==NULL); auth_db_migrations.c append-only ladder size exception.

78/78 CI green, format clean.
@KerseyFabrications

Copy link
Copy Markdown
Contributor Author

Review dispositions (Copilot + Qodo) — applied in 96ab160.

Most findings had inline threads (replied there). One Qodo summary-only item had no inline anchor:

  • free(notes) not nullified (admin_socket_memory.c, handle_memory_backfill_note_glosses) — Fixed: the freed notes buffer is now set to NULL per the project's free-and-null convention.

Summary: 6 fixed, 3 skipped with reasonfind_replace_once {0,1,2} match-count contract (OOM via *out == NULL, documented + caller-correct) and the auth_db_migrations.c append-only ladder size exception (banner-documented) are intentional designs, not drift. Build clean / 78 CI tests green / format clean.

@KerseyFabrications KerseyFabrications merged commit 2fad5e0 into main Jun 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants