Skip to content

release: develop → master (first stable release)#31

Merged
flupkede merged 13 commits into
masterfrom
develop
May 1, 2026
Merged

release: develop → master (first stable release)#31
flupkede merged 13 commits into
masterfrom
develop

Conversation

@flupkede
Copy link
Copy Markdown
Owner

@flupkede flupkede commented May 1, 2026

First stable release

This PR promotes developmaster as the first stable 1.0 release of codesearch.

What's included

Fixed

  • Issue Fatal (?) crash when attempting to index miqt #30 — LMDB resize crash on large repos (MDB_MAP_FULL"already opened with different options"). Fix closes and reopens the environment around the resize.
  • File-change tracking and reaper visibility in serve mode.

Added

  • Multi-repository serve mode (codesearch serve) with per-project, group, and cross-repo routing
  • Stdio proxy with client-side auto-reconnect — survives serve restarts and TCP keep-alive failures transparently for Claude Desktop
  • Full MCP tool surface: search, find, explore, get_chunk, status
  • Tree-sitter AST chunking for 9 languages
  • Persistent embedding cache (SHA-256 keyed, survives --force)
  • Git worktree support, long UNC path support on Windows
  • Repository groups for cross-repo search

Changed

  • Search quality: re-tuned RRF fusion (vector + BM25 + exact-identifier) to reduce agent grep-fallback
  • Idle eviction: touch only on direct query, not group fan-out
  • TUI CPU% normalised by core count

Removed

  • Server-side MCP reconnect middleware (leaked sessions, couldn't fix broken TCP — replaced by client-side retry in stdio proxy)

Smoke tests passed

  • ✅ 13/13 repos warm after serve start
  • ✅ search / find / explore routing (single project + UNC paths)
  • ✅ Serve restart resilience — Claude Desktop self-heals via proxy retry
  • ✅ CHANGELOG.md added

Known limitations

  • Remote MCP clients that don't handle 404 "Session not found" per spec (e.g. OpenCode 1.14.x) must be restarted after a serve restart. Tracked upstream.

flupkede added 11 commits April 30, 2026 20:30
- Add develop branch as default for new feature work
- master remains release branch (no rename to main)
- Document release process: develop → master + tag
- Add development workflow section to README
- Remove AGENTS.md (plan doc fulfilled)
- Focus on value proposition: multi-repo semantic search for AI agents
- Architecture diagram (mermaid) showing search + serve flow
- MCP Configuration section with per-agent configs
- Local/stdio vs Serve/multi-repo modes documented
- 5 MCP tools reference (search, find, explore, get_chunk, status)
- Serve mode with groups, lazy FSW, idle eviction, TUI
- CLI reference with index add/rm/list, groups management
- Add changes_count (AtomicU64) to SharedStores for tracking indexed/removed files
- Increment changes_count in perform_incremental_refresh_with_stores and process_batch_with_stores
- TUI reads changes_count directly from stores instead of unused repo_changes DashMap
- Add debug-level reaper logging showing idle ages for all tracked repos
- Session keepalive issue is client-side (rmcp returns 404 for stale sessions, clients must re-initialize)
When the server restarts or laptop suspends, all in-memory MCP sessions
are lost. Clients that still hold a stale session ID get a 404 from rmcp
which most MCP client libraries don't handle gracefully.

Add middleware that inspects POST /mcp requests with a session ID.
If the JSON-RPC method is "initialize", strip the stale session header
so rmcp creates a fresh session automatically. This enables seamless
reconnection after server restarts without manual client restart.
When the server restarts or laptop suspends, in-memory MCP sessions are
lost. Previously only initialize requests were handled; tools/call with
a stale session still got a 404 that clients couldn't recover from.

New approach: after rmcp returns 404 for a stale session, the middleware
automatically performs a full reconnect cycle:
1. Sends an internal initialize request to get a fresh session ID
2. Retries the original request with the new session ID
3. Returns the response with the new session ID header so the client
   updates its stored session ID transparently

Uses ReconnectState (reqwest::Client + mcp_url) via from_fn_with_state.
The client never sees the 404 — it gets the actual tool response.
get_or_open_stores now takes touch:bool. Fan-out paths (unscoped
get_chunk, group routing) pass false so they don't reset the idle
timer on every repo. Only direct single-repo queries pass true.

After get_chunk candidate detection resolves to a single repo,
touch_access is called explicitly for just that repo.
Warmup no longer resets idle timers (touch: false).
- Increase retrieval pool: limit*3 -> limit*5 in semantic pipeline
- Stronger exact identifier boost: EXACT_MATCH_RRF_K 5.0 -> 2.0
- Increase search_exact pool: limit*2 -> limit*3
- Auto-fallback to literal FTS when semantic returns <3 results
  and query contains identifiers (has_identifiers check)
- Same limit*5 change in CLI search path (src/search/mod.rs)
- fix(serve): idle eviction — touch only on direct query, not fan-out
- fix(search): improve search quality to reduce agent grep fallback
- feat(serve): transparent MCP session reconnect on stale 404
- fix(tui): normalize CPU% by core count
- fix(serve): track file changes + improve reaper visibility
Replace server-side reconnect middleware (which couldn't fix broken TCP)
with client-side reconnect in McpProxyService (--mode client).

When list_tools or call_tool hits a transport error (broken TCP, stale
session 404, keep-alive failure), the proxy:
1. Detects the transport error via is_transport_error_msg()
2. Calls force_reconnect() — clears peer + signals main loop
3. Retries up to PROXY_MAX_RETRY_ATTEMPTS (3) with backoff
4. The main loop receives the disconnect signal and reconnects to serve

This handles both server restarts and laptop suspend correctly since
the proxy controls the rmcp client connection directly.

Removed: ReconnectState + auto_reconnect_stale_sessions middleware from
serve/mod.rs (server-side middleware cannot fix dead TCP connections).
Comment thread src/serve/mod.rs Fixed
flupkede and others added 2 commits May 1, 2026 18:28
CODESEARCH_REPOS_CONFIG env var was used directly as a filesystem path
without validation or canonicalization. CodeQL flagged this as
'Uncontrolled data used in path expression'.

- repos.rs: validate env-var override has .json extension (fail-fast)
- serve/mod.rs: canonicalize config_path before fs::metadata/load_from
  to resolve symlinks and normalize .. components

Fixes: CodeQL alert on src/serve/mod.rs reload_if_changed()
fix(serve): validate config_path from env var (CodeQL path traversal)
@flupkede flupkede merged commit 5cf31be into master May 1, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants