Coding harness: let the assistant reason about your code#20
Conversation
… can't kill the daemon A peer-closed socket during the Telegram long-poll (curl on the listener thread) raised SIGPIPE inside OpenSSL's write(), and the default disposition terminated the daemon. Ignore SIGPIPE process-wide at startup (covers curl on every thread, libwebsockets, mosquitto) and add CURLOPT_NOSIGNAL to the shared curl_apply_dawn_defaults() preamble (required for multi-threaded libcurl; covers all messaging drivers + web/oauth/image curl users).
…ve the WS thread A long synchronous local-ONNX embed burst (hundreds of chunks, ~90s) starved the single WebSocket service thread enough to lapse a connected satellite's app-level keepalive, bouncing it mid-index. usleep briefly every 8 chunks (no lock held) to give latency-sensitive threads a scheduling window — ~200ms total across a multi-second index.
…djacent context Semantic+BM25 search fuzzes exact strings (IDs, field values like "birthday: '1965") and returns isolated mid-record fragments. Add a deterministic literal- grep tool: every match leads with its matched LINE (always shown, never budget- truncated) plus optional surrounding chunks. New gap-safe DB primitives: chunk_read_range (windows by chunk_index value, not OFFSET) and chunk_grep (permission-scoped JOIN; LIKE/instr; paginated). Tool: query + context(0-2) + case_sensitive + offset; range-union dedup, token budget, pagination footer (page size 50). is_available gated on the DB, not the embedder. Tests cover gap-safe range, cross-user scoping, case, wildcard-as-literal, and pagination. Five-agent reviewed (params .field_name, snprintf-overflow guard, clamps). Live-verified on a 536-record YAML; 79/79 CI.
…r chunk) Size-based chunking split structured docs mid-record, so a retrieval/grep hit landed on a fragment with surrounding fields in other chunks. Content-sniff (not filetype — a .yaml URL can be stored as .txt) for top-level YAML sequences and CSV tables and split per record (CSV chunks carry the header so each is self- describing). Conservative thresholds; prose falls through unchanged. Oversized records split at line boundaries. Tests: YAML one-record-per-chunk, CSV header+row, prose-not-mis-split guard. Live-verified: re-indexed legislators YAML chunks at record boundaries (536 chunks / ~538 records); grep returns clean per-person records. 79/79 CI.
…unless dangerous) Add optional [llm.tools] local_disabled/remote_disabled blocklists alongside the legacy enable whitelists. Per surface: disable-list set → default-on except listed; enable-list only → whitelist (legacy, unchanged); neither → all-on. TOOL_CAP_DANGEROUS tools always require explicit enable-list opt-in. Lets a newly-added tool work without editing every deployment's allowlist, while existing whitelist configs are byte-identical after upgrade (back-compat by key, not rename). apply_config takes the config struct (caller -18 lines); WebUI tool-toggle persistence is blocklist-aware (writes *_disabled, keeps dangerous opt-ins in *_enabled) so toggles don't silently revert. Security-reviewed: dangerous-tool auto-enable invariant airtight under all combinations. 79/79 CI.
…nfig The file-local static shared the name of the global dawn_config_t g_config from dawn_config.h. Harmless only because this TU didn't include that header; the tool-blocklist change surfaced it when a new llm_tools.h include pulled dawn_config.h in transitively (conflicting types for g_config). Rename to the s_ static convention so a future header reorg can't reintroduce the collision. Pure rename, no behavior change.
CI no-process-mgmt grep, HTTP+SSE transport, JSON-RPC client+FSM, hardened JSON-Schema->treg_param translator, schema v55 mcp_user_access + auth_db_mcp allowlist, and bridge registration/dispatch (trampolines, per-call auth, dangerous-tool admin denylist). 4 new test binaries; full CI suite 69/69. Bridge tool/options default OFF; not yet wired to config.
…rdening Builds on 3905c44 (MCP bridge foundation); completes the daemon/CLI side of Phase 1. WebUI (Steps 17-19) still pending. - config: [mcp]/[[mcp.server]] + [code_projects] parse/validate/defaults; executor raw-args hook so typed JSON survives (action,value) dispatch - bridge: config-driven mcp_bridge_init, per-call fail-closed auth, cbm project auto-fill; admin 0xB0-0xB8 + dawn-admin mcp/code-project CLI - code projects: schema v56 table, code_project_db CRUD/visibility, libgit2 in-process clone (size/file/depth caps, symlink sweep, SSRF+allowlist, redirect-off), nice-10 orchestrator worker, cbm code_graph provider, native code_project tool, per-session active project - libgit2 1.8.1 build in scripts/lib/libs.sh (opt-in); CMake options DAWN_ENABLE_MCP_BRIDGE_TOOL / DAWN_ENABLE_CODE_PROJECTS (OFF default) - check_no_process_mgmt.sh CI grep enforces the no-subprocess invariant - review hardening (9-agent pass): fail-closed dispatch, libgit2 redirect/ userinfo/allowlist/redact, server-table locking, post-clone size tally, libgit2 init lifecycle, shutdown wiring, standards/Doxygen/null-checks Verified: dawn + dawn-admin + tests-ci build 0 warnings; 71/71 CI; format + no-process-mgmt grep clean. Migrations v55/v56 unconditional (main still v54).
Settings panel + a Coding header popover for the code-projects subsystem,
on top of the daemon/CLI side already on this branch. Frontend not yet
browser-tested; backend is compile-verified (86/86 CI, 0 warnings).
Settings:
- config_to_json (config_env.c) serializes [mcp] (+ servers[] read-only)
and [code_projects]; webui_config.c apply-parses the editable scalars
(servers stay TOML-managed); schema.js adds MCP Bridge + Code Projects
sections under a new Coding category.
WS backend (the WebUI uses WS dispatch, not REST):
- webui_code_projects.{c,h}: list/import/refresh/delete handlers, scoped to
conn->auth_user_id; import honors import_user_required=admin and only
admins set global or act on others' projects. Dispatched in
webui_message_dispatch.c (#ifdef DAWN_ENABLE_CODE_PROJECTS); built via
DawnTools.cmake under code-projects + ENABLE_WEBUI.
- Strong override of code_project_broadcast_status_changed in
webui_broadcasts.c pushes code_project_status_changed for live re-fetch.
Frontend:
- code-projects.js (DawnCodeProjects), code-projects.css (@imported in
main.css), #coding-btn + #code-projects-popover in index.html, dawn.js
message cases + init + auth-gated reveal. XSS-escaped rendering.
- ui-design-architect review applied: real showConfirmModal API + focus
trap/return, four-way popover mutual-close, aria-labels + focus rings,
themed badge tokens, elevation/backdrop, narrow-viewport bottom sheet.
Verified: dawn + dawn-admin + tests-ci build 0 warnings; 86/86 CI; C
format clean; all JS passes node --check. Browser test pending.
Stabilization fixes from live WebUI testing of the code-projects panel: - Import validates the repo exists before creating a DB row. The remote probe (in-process libgit2 ref negotiation, redirects off, bounded by server timeouts) runs on the worker thread, not the audio-carrying lws service thread. Nonexistent/unreachable URL -> no row + failure toast; repo exists but clone later fails (size/depth caps) -> error row kept (refreshable settings issue). cp_job_t now carries import params. - valid_name: allow uppercase (isalnum) so e.g. Hello-World imports. - Coding popover button hidden unless [code_projects].enabled (JS gate via get_config_response + #coding-btn.hidden CSS specificity override). - [mcp]/[code_projects] now written by config_write_toml, not just config_to_json, so the WebUI "Enable Code Projects" toggle persists across restart (also fixes latent wipe of a hand-added [mcp]). Build clean, format clean, JS syntax checked. Live-verified: bad URL toasts failure with no phantom row/clone dir; valid URL imports.
A clone with cbm-mcp absent reported a bare "indexing failed" that reads like a bug rather than a missing backend. worker_do_index now pre-checks backend availability and reports "clone ready, but no code server connected — start cbm-mcp, then re-index". Added via the provider abstraction (no layering break): new optional is_available() on the code_graph_provider vtable, backed by a new mcp_bridge_server_connected() that reports MCP_STATE_CONNECTED without triggering a call or reconnect. Build clean, format clean.
Operator service that runs codebase-memory-mcp (stdio) behind mcp-proxy, re-exposing it over SSE for DAWN's MCP bridge. Unit + EnvironmentFile + logrotate + install.sh + README, modeled on services/llama-server. Runs as the dawn user (reads /var/lib/dawn/source, writes its graph cache). cbm built with libgit2 disabled (fallback to git log).
Two cbm-bridge fixes (entangled in mcp_bridge_tool.c, so committed together): Admin-grant bootstrap: auth_db_mcp_grant_all_admins ran inside mcp_bridge_init (during tools_register_all), ~460 lines before auth_db_init — it hit a closed DB and no-op'd, leaving mcp_user_access empty so even admin was denied every cbm_* tool. Moved the grant to dawn.c after auth_db_init, keyed on every configured [[mcp.server]] alias. Name-translation boundary: cbm names projects by slugifying the absolute repo path and prefixes qualified_name/file_path with it, leaking the filesystem layout to the LLM and baking the path into conversation history. New code_project_namemap translates the LLM's clean identifiers to cbm's namespace outbound and strips the slug + source_root paths from results inbound; the prefix is captured from cbm's own list_projects (no schema change). LLM now sees only clean names + project-relative paths; stored conversations survive a directory move + reindex. Also: link code_project_namemap.c into test_mcp_bridge. Build clean, 86/86 CI tests pass.
cbm's list_projects duplicated the native code_project list tool (which has clean names + per-user visibility), and the two returned different formats — a DX wart Friday flagged during testing. register_server_tools now skips registering cbm/list_projects as an LLM-facing tool (gated on DAWN_ENABLE_CODE_PROJECTS so it only hides when code_project exists). The namemap capture is unaffected: it calls list_projects via mcp_bridge_call_tool, which bypasses the registry. Build clean, 86/86 CI tests pass.
Extends the code-projects harness across schema/git/namemap/service/surfaces:
- branch: import/track/switch a branch (libgit2 fetch+checkout); schema v66 adds
branch/kind/graph_name to code_projects via an idempotent (PRAGMA-probed) ALTER.
- link-local: register an existing local checkout (kind=local, admin-only, gated
by [code_projects] allowed_local_roots) — never cloned or removed. cbm-mcp.service
gains an opt-in ProtectHome=tmpfs + BindReadOnlyPaths sandbox block.
- refresh (fetch + incremental) vs rebuild (delete graph + reindex); startup
reconciliation heals rows interrupted mid-index.
- fix: cbm delete sent the wrong arg key ("project_name") and the clean name, so
the on-disk graph was never removed; now resolves the persisted path-derived
slug. namemap reworked from a single source_root prefix to a per-project map
(multi-repo + shared-cbm safe).
- surfaces: WebUI (Import/Link tabs, branch field, rebuild/set-branch, full-width
URLs, conversation-style hover actions); admin opcodes 0xD0-0xD6; dawn-admin
link/rebuild/set-branch + import --branch; admins now see all projects.
- docs: CODING_PROJECTS.md (user/operator guide)
Reviewed by 5 agents (0 critical); findings fixed. Debug build clean (0 warnings);
test_code_project_db 8/8, test_code_project_git 5/5; format clean.
- README: Code Projects rows in the Optional Features + Documentation tables. - GETTING_STARTED: Code Projects pointer under Optional Components. - CODING_PROJECTS.md §5: point the cbm-sharing deep-dive at the atlas archive; CODING_HARNESS_CBM_SHARING.md stays untracked (moves to atlas after the PR).
Code Review by Qodo
Context used✅ Compliance rules (platform):
27 rules 1.
|
There was a problem hiding this comment.
Pull request overview
Adds an opt-in “coding harness” that lets DAWN index Git repositories into a code graph via an operator-run cbm MCP server, and exposes UI/admin/LLM surfaces to manage/query those projects (feature-gated and off by default).
Changes:
- Introduces MCP bridge client (HTTP+SSE) with per-user access gating and admin tooling.
- Adds code-projects subsystem (DB schema + libgit2 clone/link + indexing orchestration) plus WebUI/admin-socket/dawn-admin surfaces.
- Improves document tooling (structured chunking + literal grep + indexing yield) and extends tool enable/disable config (blocklist + whitelist models).
Reviewed changes
Copilot reviewed 96 out of 96 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| www/js/ui/settings/schema.js | Adds Settings panels for MCP Bridge and Code Projects + “Coding” category. |
| www/js/ui/scheduler-queue.js | Ensures header popovers close Code Projects when opening Scheduler Queue. |
| www/js/ui/memory.js | Ensures Memory popover closes Code Projects if open. |
| www/js/ui/doc-library.js | Ensures Doc Library popover closes Code Projects if open. |
| www/js/dawn.js | Adds WebSocket dispatch + visibility gating for Code Projects UI. |
| www/index.html | Adds Coding header button + Code Projects popover markup and script include. |
| www/css/main.css | Adds Code Projects CSS component import. |
| tests/test_document_chunker.c | Adds unit tests for structure-aware YAML/CSV chunking. |
| tests/test_code_project_git.c | Adds libgit2-based clone/fetch/checkout/link validation tests. |
| tests/test_auth_db_mcp.c | Adds tests for per-user MCP access (grant/revoke/check/admin). |
| tests/smoke_test_harness.sh | Adds optional end-to-end smoke test harness for import/index flow. |
| tests/CMakeLists.txt | Wires new unit tests; gates libgit2 test behind DAWN_ENABLE_CODE_PROJECTS. |
| src/webui/webui_tools.c | Fixes persisted tool config round-tripping for blocklist vs whitelist setups. |
| src/webui/webui_message_dispatch.c | Adds WebSocket handlers for Code Projects messages (feature-gated). |
| src/webui/webui_config.c | Applies [mcp] and [code_projects] settings from WebUI JSON payloads. |
| src/webui/webui_broadcasts.c | Adds WebUI broadcasts for code-project status/import failures. |
| src/tools/tools_init.c | Registers MCP bridge + code-project service/tool; adds document_grep tool registration. |
| src/tools/search_summarizer.c | Renames module-static config variable for clarity. |
| src/tools/document_index_pipeline.c | Adds periodic CPU yield during embed loop to reduce thread starvation. |
| src/tools/document_db.c | Adds chunk-range read + literal grep query helpers. |
| src/tools/document_chunker.c | Adds conservative structure-aware YAML/CSV record chunking. |
| src/tools/code_project_tool.c | Adds native code_project LLM tool (list/set_active/status). |
| src/tools/code_graph_provider_cbm.c | Adds cbm-backed code-graph provider using MCP bridge tools. |
| src/llm/llm_tools.c | Adds blocklist/whitelist resolution; exposes thread-local raw tool-call JSON args. |
| src/llm/llm_interface.c | Applies new llm.tools config model (enable/disable lists). |
| src/dawn.c | Ignores SIGPIPE; adds MCP admin bootstrap grants; adds orderly shutdown of new subsystems. |
| src/config/config_validate.c | Validates MCP server config and code-project limits/regex. |
| src/config/config_parser.c | Parses MCP and code-projects config sections + new tool disable lists. |
| src/config/config_env.c | Serializes MCP/code-projects settings to JSON; round-trips MCP/code-projects + new disable lists to TOML. |
| src/config/config_defaults.c | Adds secure-by-default code-projects defaults. |
| src/auth/auth_db_statements.c | Adds prepared statements for chunk-range reads and literal grep. |
| src/auth/auth_db_migrations.c | Adds v64–v66 migrations to global schema ladder. |
| src/auth/auth_db_migrations_v66.c | Adds idempotent ALTERs for code_projects branch/kind/graph_name columns. |
| src/auth/auth_db_migrations_v65.c | Adds idempotent code_projects table creation. |
| src/auth/auth_db_migrations_v64.c | Adds idempotent mcp_user_access table creation. |
| src/auth/auth_db_mcp.c | Implements per-user MCP access allowlist CRUD. |
| src/auth/admin_socket.c | Dispatches new ADMIN_MSG_MCP_* and ADMIN_MSG_CODE_PROJ_* opcodes (feature-gated). |
| src/auth/admin_socket_mcp.c | Implements admin-socket MCP list/status/grant/revoke/reset handlers. |
| services/cbm-mcp/README.md | Documents running cbm behind mcp-proxy as a systemd service. |
| services/cbm-mcp/install.sh | Adds installer for cbm-mcp systemd service + dependencies. |
| services/cbm-mcp/cbm-mcp.service | Adds hardened systemd unit for mcp-proxy + cbm. |
| services/cbm-mcp/cbm-mcp.conf | Adds EnvironmentFile for cbm-mcp deployment. |
| services/cbm-mcp/cbm-mcp-logrotate | Adds logrotate config for cbm-mcp logs. |
| scripts/lib/libs.sh | Adds opt-in libgit2 build-from-source helper. |
| scripts/check_no_process_mgmt.sh | Adds CI invariant to forbid process-management calls in harness code. |
| README.md | Links new Code Projects documentation. |
| include/webui/webui_code_projects.h | Declares WebUI Code Projects WebSocket handlers. |
| include/tools/mcp_transport.h | Adds MCP transport abstraction interface. |
| include/tools/mcp_transport_http_sse.h | Declares HTTP+SSE transport factory. |
| include/tools/mcp_client.h | Declares MCP JSON-RPC client/FSM and call API. |
| include/tools/mcp_bridge.h | Declares MCP bridge API for tool registration/dispatch/status. |
| include/tools/mcp_bridge_schema.h | Declares MCP JSON Schema translation and description hardening helpers. |
| include/tools/document_grep.h | Declares document_grep tool registration entrypoint. |
| include/tools/document_db.h | Adds types/APIs for grep hits and chunk-range reads. |
| include/tools/code_project_tool.h | Declares native code_project tool registration. |
| include/tools/code_project_service.h | Declares code-projects orchestrator API and broadcast hooks. |
| include/tools/code_project_namemap.h | Declares cbm slug/name translation boundary API. |
| include/tools/code_project_git.h | Declares libgit2 clone/fetch/validate wrapper API. |
| include/tools/code_project_db.h | Declares code_projects DB CRUD/visibility API. |
| include/tools/code_graph_provider.h | Declares code-graph provider vtable (cbm-backed in Phase 1). |
| include/llm/llm_tools.h | Updates llm.tools config API and adds raw-args accessor. |
| include/core/session_manager.h | Adds per-session active code-project fields. |
| include/core/curl_buffer.h | Sets CURLOPT_NOSIGNAL in shared curl defaults for thread-safety. |
| include/config/dawn_config.h | Adds MCP/code-projects config structs and tool disable lists. |
| include/auth/auth_db_mcp.h | Declares MCP allowlist DB APIs. |
| include/auth/auth_db_internal.h | Bumps schema version to 66 and adds new stmt/migration declarations. |
| include/auth/admin_socket.h | Adds admin socket opcodes for MCP and code projects. |
| include/auth/admin_socket_internal.h | Declares MCP/code-project admin handler prototypes. |
| GETTING_STARTED.md | Adds “Code Projects (Coding Harness)” section linking to docs. |
| docs/CODING_PROJECTS.md | Adds user/operator guide for code projects feature. |
| DEPENDENCIES.md | Documents libgit2 dependency gating/installation for code projects. |
| dawn.toml.example | Documents MCP bridge and code-projects TOML configuration + new tool blocklist model. |
| dawn-admin/socket_client.h | Declares dawn-admin client helpers for MCP and code-project opcodes. |
| dawn-admin/main.c | Adds dawn-admin mcp … and dawn-admin code-project … commands. |
| CMakeLists.txt | Adds libgit2 detection/version gate; adds global schema helper sources; adds no-process-mgmt check target. |
| cmake/DawnTools.cmake | Adds DAWN_ENABLE_MCP_BRIDGE_TOOL and DAWN_ENABLE_CODE_PROJECTS build options and sources. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- mcp transport (SSRF): reject a cross-origin or credentialed SSE `endpoint` event — it could redirect authenticated POSTs (bearer header) to another host. resolve_endpoint now requires the resolved URL to stay same-origin as the configured base URL. - dawn.c: skip disabled / empty-alias servers when bootstrapping admin MCP access (was inserting empty-alias grant rows). - check_no_process_mgmt.sh: fix the no-subprocess CI invariant's harness globs — webui_projects.* typo → webui_code_projects.*; drop dead v55/v56 + dawn_admin_* globs; add v64-v66 migrations and the dawn-admin client (main.*/socket_client.*); harden comment-stripping to skip multi-line block comments (so doc-comment prose like "daemon (...)" can't false-positive). Now scans 33 files — was missing the WebUI and dawn-admin harness handlers. - code_project_git.c: return SUCCESS instead of literal 0 from the libgit2/nftw callbacks (named-constant convention; FAILURE already used alongside). - docs/code_project_db.h: schema comment v56 → v65/v66; add @param Doxygen to the code_project_db_* and mcp_client_* public APIs; CODING_PROJECTS.md: cbm link is HTTP+SSE on localhost, not "a local socket". Skipped (false positive): `_fn` typedef-suffix flag — `_fn` is the codebase convention for function-pointer typedefs (19 existing uses). Build clean (0 warnings); test_code_project_db 8/8, test_code_project_git 5/5, test_mcp_bridge 6/6; check_no_process_mgmt passes; format clean.
|
@qodo-code-review |
| /* ---- YAML sequence: many top-level "- " records, mostly structured lines ---- */ | ||
| if (top_dash >= STRUCT_YAML_MIN_ITEMS && | ||
| (double)(top_dash + indented) >= (double)nonblank * STRUCT_YAML_INDENT_FRAC) { | ||
| if (result_init(out) != SUCCESS) |
| p = (hdr_end < end) ? hdr_end + 1 : end; | ||
| char buf[8192]; | ||
| while (p < end) { | ||
| const char *le = line_end(p, end); | ||
| if (!line_blank(p, le)) { | ||
| int row_len = (int)(le - p); | ||
| /* chunk = "header\nrow" so each record is self-describing. */ | ||
| int n = snprintf(buf, sizeof(buf), "%.*s\n%.*s", hdr_len, hdr, row_len, p); | ||
| if (n > 0) { | ||
| if (result_add(out, buf, n < (int)sizeof(buf) ? n : (int)sizeof(buf) - 1) != | ||
| SUCCESS) { | ||
| chunk_result_free(out); | ||
| return false; | ||
| } | ||
| } | ||
| } | ||
| p = (le < end) ? le + 1 : end; | ||
| } |
| for (size_t i = 0; i < nlen && esc_len + 2 < sizeof(escaped); i++) { | ||
| if (needle[i] == '%' || needle[i] == '_' || needle[i] == '\\') | ||
| escaped[esc_len++] = '\\'; | ||
| escaped[esc_len++] = needle[i]; | ||
| } |
| if (piece_end == p) | ||
| piece_end = (p + max_chars < stop) ? p + max_chars : stop; /* one giant line */ |
| char anchored[256]; | ||
| snprintf(anchored, sizeof(anchored), "^(%s)$", allowed_host_pattern); | ||
| regex_t re; | ||
| int crc = regcomp(&re, anchored, REG_EXTENDED | REG_NOSUB); | ||
| if (crc == 0) { | ||
| ok = (regexec(&re, host, 0, NULL, 0) == 0); | ||
| regfree(&re); | ||
| } else { | ||
| /* Fail closed (ok stays false), but make the misconfiguration | ||
| * diagnosable — otherwise every import silently "fails the allowlist". */ | ||
| OLOG_ERROR("code_project: allowed_host_pattern failed to compile (rc=%d) — " | ||
| "rejecting all imports until fixed", | ||
| crc); | ||
| } |
- mcp transport (SSRF): reject a cross-origin or credentialed SSE `endpoint` event — it could redirect authenticated POSTs (bearer header) to another host. resolve_endpoint now requires the resolved URL to stay same-origin as the configured base URL. - dawn.c: skip disabled / empty-alias servers when bootstrapping admin MCP access (was inserting empty-alias grant rows). - check_no_process_mgmt.sh: fix the no-subprocess CI invariant's harness globs — webui_projects.* typo → webui_code_projects.*; drop dead v55/v56 + dawn_admin_* globs; add v64-v66 migrations and the dawn-admin client (main.*/socket_client.*); harden comment-stripping to skip multi-line block comments (so doc-comment prose like "daemon (...)" can't false-positive). Now scans 33 files — was missing the WebUI and dawn-admin harness handlers. - code_project_git.c: return SUCCESS instead of literal 0 from the libgit2/nftw callbacks (named-constant convention; FAILURE already used alongside). - docs/code_project_db.h: schema comment v56 → v65/v66; add @param Doxygen to the code_project_db_* and mcp_client_* public APIs; CODING_PROJECTS.md: cbm link is HTTP+SSE on localhost, not "a local socket". Skipped (false positive): `_fn` typedef-suffix flag — `_fn` is the codebase convention for function-pointer typedefs (19 existing uses). Build clean (0 warnings); test_code_project_db 8/8, test_code_project_git 5/5, test_mcp_bridge 6/6; check_no_process_mgmt passes; format clean.
5c42d29 to
3285dea
Compare
Copilot flagged four edge cases on PR #20; an architecture + efficiency review then caught that two first-pass fixes targeted the wrong layer: - CSV path ignored max_chars and emitted records that read-back truncates (char text[DOC_CHUNK_TEXT_MAX]); now respects max_chars and falls through to prose for an oversized atomic row (parity with the YAML path). - struct_emit's "giant line" branch was unreachable, so a single line over max_chars was emitted whole and byte-truncated mid-UTF-8 on read-back; reworked to hard-split at max_chars with UTF-8 lead-byte back-off. - chunk_structured counted any indented line as a YAML signal; now requires a ':' mapping key or a nested '- ' item so indented prose can't trip it. - the over-long-needle truncation was at the grep tool layer (silent prefix match); now fails closed there, with the DB-layer guard as a backstop. New fixtures for the CSV-fallback and UTF-8 oversized-line paths. test_document_chunker 16/16, test_document_db 16/16, 0 warnings.
Adds an opt-in coding harness — DAWN can index Git repositories into a code graph and answer questions about them ("what calls foo?", "trace from main", "what's on this branch?"), via the external cbm (codebase-memory-mcp) code-graph server. Off by default; zero impact on existing builds.
What's included
How it works
A voice/text query routes through the LLM, which calls the bridged cbm query tools (search_code, trace_path, get_architecture, …). DAWN auto-fills the active project, translates clean identifiers ↔ cbm's path-derived graph slug, and scrubs slugs/paths back out of results. Project management (import/link/branch/rebuild) is operator-facing (WebUI + dawn-admin); the LLM only queries.
Merge safety
-DDAWN_ENABLE_CODE_PROJECTS=ON(and-DDAWN_ENABLE_MCP_BRIDGE_TOOL=ON); the option defaults OFF and hard-errors without the MCP bridge. Existing/preset builds (full,local,server, CI) compile it out entirely — no new behavior, code, or runtime cost.Security
Testing
Follow-ups (not in this PR)
allowed_local_rootsin the WebUI settings panel, automate the cbm-mcp sandbox grant in the installer, and wire the feature into thefullpreset /scripts/install.sh(so it's buildable without manual-Dflags) once that setup is automated. Today it's explicit-opt-in.