Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

First stable release of codesearch — a Rust-based hybrid (vector + BM25 + AST)
code search MCP server, optimised for AI coding agents working across many
repositories.

### Added

- **Multi-repository serve mode** (`codesearch serve`): a long-running HTTP/SSE
process that holds many indexed repositories warm at the same time, with
per-project routing via `project=…`, group routing via `group=…`, and
cross-repository search using RRF fusion across project boundaries.
- **Stdio proxy with auto-reconnect**: `codesearch mcp` (stdio mode) detects a
running `serve` process and proxies tool calls to it. The proxy now performs
client-side retries with a forced reconnect when it sees a transport-level
failure (broken TCP keep-alive, stale session 404, server restart, laptop
suspend) so MCP clients like Claude Desktop self-heal transparently. After a
serve restart the first call returns a clear "reconnecting" message and the
next call succeeds.
- **MCP tool surface optimised for agents** to reduce grep-fallback behaviour:
- `search` (semantic / hybrid / lexical / pure-literal regex modes)
- `find` (definition / usages / imports / dependents)
- `explore` (file outline / similar chunks)
- `get_chunk` for cheap follow-up reads of a specific code chunk
- `status` (index / projects)
- **Tree-sitter AST-aware chunking** for 9 languages: Rust, Python, JavaScript,
TypeScript, C, C++, C#, Go, Java.
- **Persistent embedding cache** keyed on SHA-256 of chunk content, surviving
`--force` rebuilds and per-file re-indexes.
- **Git worktree support**: when `.git` is a worktree marker file (not a
directory), the project root is correctly resolved to the worktree itself.
- **Long UNC-path support** on Windows for repositories under `\\?\C:\…` paths.
- **Repository groups** for cross-repo search across user-defined sets of
projects (e.g. all *.Aprimo* repos).

### Changed

- **Search quality**: re-tuned RRF fusion of the vector / BM25 / exact-identifier
signals so common tool names and exact strings are no longer drowned out by
semantic neighbours, reducing the rate at which agents fall back to external
grep.
- **Idle eviction**: only refreshes a project's "last accessed" timestamp on a
direct query against that project, not on fan-out queries that touch the
index merely because they routed through the same group.
- **TUI CPU%**: now normalised by core count.

### Fixed

- **Security**: validate `CODESEARCH_CONFIG` environment variable against a path
traversal pattern (CodeQL finding). Config path is now rejected if it contains
`..` segments, preventing a directory traversal via env var.
- **Issue #30** ([LMDB resize crash on large repositories](https://github.com/flupkede/codesearch/issues/30)):
When the database grew beyond its initial allocation (`MDB_MAP_FULL`), the
resize failed with `"an environment is already opened with different options"`.
The fix closes and reopens the LMDB environment around the resize, allowing
codesearch to index large repositories (tested: 4400+ files, 89 MB) without
crashing.
- File-change tracking and reaper visibility in `serve` mode.

### Removed

- Server-side transparent MCP session-reconnect middleware: replaced by the
client-side retry in the stdio proxy. The middleware could not reach
non-compliant remote MCP clients (their HTTP pool gives up at the TCP layer
before the request hits the server) and added a session-counter leak.

### Known limitations

- Remote MCP clients that do not handle 404 "Session not found" per the MCP
spec (e.g. OpenCode 1.14.x at the time of writing) need to be restarted after
a `codesearch serve` restart.
- `codesearch serve` keeps one writer per database (LMDB invariant). Concurrent
reindex from a second process is rejected.

[Unreleased]: https://github.com/flupkede/codesearch/compare/master...develop
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "codesearch"
version = "1.0.72"
version = "1.0.74"
edition = "2021"
authors = ["codesearch contributors"]
license = "Apache-2.0"
Expand Down
8 changes: 5 additions & 3 deletions src/serve/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1731,10 +1731,12 @@ pub async fn run_serve(
.map_err(std::io::Error::other)
};

// Build session manager with extended keep_alive (default is 5 min which kills
// idle MCP sessions too aggressively). 30 minutes matches our repo idle eviction.
// Build session manager without keep_alive timeout. The default rmcp timeout
// (5 min) kills idle sessions too aggressively for a local long-running serve.
// We run single-user local, so abandoned sessions cost nothing — let TCP
// liveness determine when a session is truly dead.
let mut session_manager = LocalSessionManager::default();
session_manager.session_config.keep_alive = Some(std::time::Duration::from_secs(30 * 60));
session_manager.session_config.keep_alive = None;
let session_manager = Arc::new(session_manager);
let config = StreamableHttpServerConfig::default();

Expand Down