codesearch

Multi-repo semantic code search for AI agents — a Rust MCP server with vector + BM25 hybrid retrieval, symbol navigation, and cross-repository orchestration. Fully local, fully offline, no GPU, no Docker.

codesearch gives AI agents (OpenCode, Claude Code, Cursor, and any MCP client) deep codebase understanding through 5 unified MCP tools. Index once, search semantically across multiple repositories simultaneously.

Why codesearch?

Multi-repo serve mode: Fan-out queries across repository groups with cross-repo RRF ranking
Hybrid retrieval: Vector embeddings + BM25 full-text search fused with Reciprocal Rank Fusion
Symbol navigation: Jump to definitions, find usages, trace imports and dependents — in the same tool
AST-aware chunking: Tree-sitter parsing for 15 languages — chunks align to functions/classes (and Markdown sections), not arbitrary line ranges
Token-efficient: Returns metadata by default; agents fetch full code only when needed via get_chunk
Lightweight footprint: Hundreds of MB on disk, runs on CPU only, no runtime model downloads (works behind enterprise proxies)
Zero config for single repos: codesearch index && codesearch mcp — done

How does this compare?

The MCP code-search ecosystem grew rapidly in late 2025 / early 2026 and many projects share the same baseline stack (Rust + tree-sitter + BM25 + embeddings + MCP). codesearch's deliberate focus is:

Focus area	codesearch	Typical alternative
Repository scope	Multi-repo serve with cross-repo RRF	Usually single repo at a time
Footprint	~hundreds of MB, CPU-only, no Docker	GB-scale, GPU, Docker, or cloud
Enterprise / offline	No runtime fetches; static binary	Often pulls models at first run
Symbol navigation	`find` (def/usages/imports/dependents) co-located with semantic search	Often a separate code-graph tool
Token cost per call	`compact=true` by default; chunks fetched on demand	Frequently dumps full snippets

Other projects in the same niche may go deeper on call-graph traversal, polished standalone CLIs, or memory/knowledge-graph features. codesearch is intentionally narrower — it picks "lightweight, multi-repo, MCP-native, fully offline" and stays on that lane.

Architecture

graph TB
    Agent[AI Agent / MCP Client] -->|MCP stdio or HTTP| Router{MCP Router}

    Router --> Search[search tool]
    Router --> Find[find tool]
    Router --> Explore[explore tool]
    Router --> GetChunk[get_chunk tool]
    Router --> FindImpact[find_impact tool]
    Router --> Status[status tool]

    Search -->|mode=semantic| Semantic[Vector ANN + BM25 + RRF Fusion]
    Search -->|mode=literal| Literal[Tantivy FTS / Regex]

    Find -->|definition/usages| SymbolIndex[Symbol Index]
    Find -->|imports/dependents| DepGraph[Dependency Graph]

    Explore -->|outline| TreeSitter[Tree-sitter AST]
    Explore -->|similar| Semantic

    Semantic --> Arroy[arroy ANN vectors]
    Semantic --> Tantivy[Tantivy BM25]
    Arroy --> LMDB[(LMDB)]
    Tantivy --> TantivyIdx[(Tantivy Index)]

    GetChunk --> LMDB

    FindImpact -->|C# symbols| CSharpHelper[scip-csharp helper]
    CSharpHelper -->|SCIP index| ScipLMDB[(LMDB scip_symbols)]

    subgraph "Serve Mode (multi-repo)"
        ServeRouter[HTTP Router] -->|project/group routing| Repo1[Repo A]
        ServeRouter --> Repo2[Repo B]
        ServeRouter --> RepoN[Repo N]
    end

    Router -->|client mode| ServeRouter

Quick Start

Install

Download pre-built binaries from Releases:

Platform	Download
Windows x86_64	`codesearch-windows-x86_64.zip`
Windows x86_64 + C#	`codesearch-windows-x86_64-with-csharp.zip`
Linux x86_64	`codesearch-linux-x86_64.tar.gz`
Linux x86_64 + C#	`codesearch-linux-x86_64-with-csharp.tar.gz`
macOS ARM64	`codesearch-macos-arm64.tar.gz`
macOS ARM64 + C#	`codesearch-macos-arm64-with-csharp.tar.gz`

Or build from source:

git clone https://github.com/flupkede/codesearch.git
cd codesearch
cargo build --release

Index a repository

# Register and index the current repo (adds to ~/.codesearch/repos.json)
codesearch index add

# Register and index a repo from outside the repo folder
codesearch index add /path/to/my-project

# Incremental update (only changed files)
codesearch index /path/to/my-project

# Full rebuild
codesearch index /path/to/my-project --force

# Remove a repo
codesearch index rm /path/to/my-project

# List registered repos
codesearch index list

# Remove stale entries (relocates moved repos first, then drops the rest)
codesearch index prune

codesearch index add is intended to be run from inside the repo you want to register. If you're launching it from somewhere else, pass the repo path explicitly.

First-time indexing takes 2–5 minutes. Subsequent runs are incremental (10–30s). Branch switches trigger automatic re-indexing.

MCP Configuration

codesearch connects to AI agents via MCP. Two modes:

Mode	How	Best for
Local (stdio)	`codesearch mcp` — single repo, auto-index + file watching	Working on one project
Serve (HTTP)	`codesearch serve` — multi-repo, TUI dashboard, lazy FSW	Multiple repos, cross-repo search

Local / Single Repo

The agent spawns codesearch mcp as a subprocess. It auto-detects the nearest index and starts a file watcher.

OpenCode — ~/.config/opencode/config.json:

{
  "mcp": {
    "codesearch": {
      "type": "local",
      "command": ["codesearch", "mcp"],
      "enabled": true
    }
  }
}

Claude Code — ~/.config/claude-code/config.json:

{
  "mcpServers": {
    "codesearch": {
      "command": "codesearch",
      "args": ["mcp"]
    }
  }
}

Claude Desktop — claude_desktop_config.json:

{
  "mcpServers": {
    "codesearch": {
      "command": "codesearch",
      "args": ["mcp"]
    }
  }
}

Serve / Multi-Repo

Start the server first, then connect your agent. The server manages all registered repos with a TUI dashboard, lazy filesystem watchers, and idle eviction.

# Start the server (default port 39725)
codesearch serve

OpenCode — connect via HTTP:

{
  "mcp": {
    "codesearch": {
      "type": "remote",
      "url": "http://127.0.0.1:39725/mcp",
      "enabled": true
    }
  }
}

Claude Code / Claude Desktop — force serve connection via --mode client:

{
  "mcpServers": {
    "codesearch": {
      "command": "codesearch",
      "args": ["mcp", "--mode", "client"]
    }
  }
}

Note: In multi-repo mode, agents must specify project or group in tool calls. status always works without scope. get_chunk auto-routes when the chunk_id is unique across repos; if ambiguous, it returns candidates and requires project.

MCP Tools Reference

`search` — Code Search

Parameter	Type	Description
`query`	string	Natural language, code snippet, regex, or exact term
`mode`	`"semantic"` \| `"literal"`	Search backend (default: semantic)
`filter_path`	string	Path prefix filter (semantic mode)
`file_glob`	string	Glob filter (literal mode), e.g. `"src/*/.rs"`
`language`	string	Language filter (literal mode)
`regex`	bool	Treat query as regex (literal mode)
`phrase`	bool	Exact phrase match (literal mode)
`compact`	bool	Metadata only, no code (default: true)
`limit`	int	Max results (default: 10 semantic, 20 literal)
`project`	string	Target specific repo (multi-repo)
`group`	string	Search across repo group (multi-repo)

Semantic mode combines vector similarity (fastembed) + BM25 lexical scoring + exact identifier boosting, fused with RRF. Best for conceptual queries and mixed natural-language + symbol searches.

Literal mode uses Tantivy FTS. Use regex=true for patterns with punctuation (foo::bar, Vec<T>). Use phrase=true for multi-word exact matches.

`find` — Symbol Navigation

Parameter	Type	Description
`symbol`	string	Symbol name or file path (for imports)
`kind`	`"definition"` \| `"usages"` \| `"imports"` \| `"dependents"`	Navigation type
`definition_kind`	string	Filter: Function, Class, Method, Struct, Trait, Enum, Interface
`project` / `group`	string	Multi-repo routing

`explore` — File Exploration

Parameter	Type	Description
`target`	string	File path (outline) or chunk_id (similar)
`kind`	`"outline"` \| `"similar"`	Exploration type
`limit`	int	Max results for similar mode
`project` / `group`	string	Multi-repo routing

Outline returns all top-level symbols in a file (kind, signature, line range). Similar finds semantically related chunks to a given chunk_id.

`get_chunk` — Read Code

Parameter	Type	Description
`chunk_id`	int	Chunk ID from search/explore results
`context_lines`	int	Extra lines before/after (0-20, default: 0)
`project`	string	Disambiguate if chunk_id exists in multiple repos

In multi-repo mode: auto-routes when chunk_id is unique; returns candidates list when ambiguous.

`find_impact` — Symbol Reference Impact

Find all call-sites and references to a symbol with file/line precision, powered by per-language semantic analysis. Currently supports C# (via the bundled scip-csharp helper).

Parameter	Type	Description
`symbol_name`	string	Symbol name (e.g. `"FieldDefinition.Validate"`)
`file`	string	File path for position-based lookup
`line`	int	Line number for position-based lookup
`language`	string	Language hint (auto-detected from file extension)
`project` / `group`	string	Multi-repo routing

Returns a list of references with file, start_line, end_line, and kind (e.g. "call", "definition"). Exposes index_age_seconds so agents can reason about staleness.

Note: Requires the -with-csharp release variant or a separately installed scip-csharp helper. See C# Semantic Search.

`status` — Index Info

Parameter	Type	Description
`kind`	`"index"` \| `"projects"`	What to query
`project` / `group`	string	Multi-repo routing

Serve Mode (Multi-Repo)

For working across multiple repositories simultaneously:

codesearch serve

This starts a background HTTP server with:

TUI dashboard (ratatui) showing repo status, CPU usage, active sessions
Lazy filesystem watchers — activated on first query per repo
Idle eviction (30min) — unused repos are unloaded from memory
Session tracking via MCP keep-alive

Repository Registration

Repos are registered via codesearch index add:

# Register a repo (creates index + adds to ~/.codesearch/repos.json)
codesearch index add /path/to/my-project

# Remove a repo
codesearch index rm /path/to/my-project

# List registered repos
codesearch index list

# Clean up stale entries (relocates moved repos, drops the rest)
codesearch index prune

The repository alias (the key in repos.json, used for groups and the MCP project argument) is always derived automatically from the directory name — there is no --alias flag.

Serve reads ~/.codesearch/repos.json on startup and manages all registered repos.

Moved or renamed repositories

If you rename or move a registered folder, serve does not crash. On startup it tries to relocate each missing repo automatically: it captures every repo's git remote (remote.origin.url) at registration, and on a missing path it scans nearby folders (bounded depth, override with CODESEARCH_RELOCATE_MAX_DEPTH, default 3) for a git checkout with the same remote. A single unambiguous match is rewritten into repos.json; otherwise the entry is logged and skipped (never indexed against a dead path). Run codesearch index prune to relocate what can be relocated and drop the rest.

A hand-edited repos.json is also tolerated: empty entries, orphaned metadata, and group references to unknown repos are cleaned up on load rather than crashing.

Groups

Groups let you search across related repositories:

codesearch groups add my-group repo1 repo2 repo3
codesearch groups list

Then in MCP tools: group="my-group" fans out the query to all repos in the group.

MCP Connection Modes

The codesearch mcp command supports three modes:

Mode	Behavior
`auto` (default)	Connects to serve if running, otherwise local stdio
`client`	Always connects to serve, fails if not running
`local`	Always uses local DB (classic single-repo stdio)

codesearch mcp --mode client  # force serve connection

The serve endpoint is available at /mcp (Streamable HTTP transport).

CLI Reference

Command	Description
`codesearch index [PATH]`	Index a repo (incremental; `--force` for full rebuild)
`codesearch search <QUERY>`	CLI search (for testing)
`codesearch mcp`	Start MCP stdio server
`codesearch serve`	Start multi-repo HTTP server with TUI
`codesearch stats`	Show database statistics
`codesearch clear`	Delete index
`codesearch doctor`	Health check (model, index, config)
`codesearch setup`	Download embedding models
`codesearch cache stats\|clear`	Manage embedding cache
`codesearch groups list\|add\|remove`	Manage repository groups

Configuration

Environment Variables

Variable	Description
`CODESEARCH_SERVE_PORT`	Serve mode port (default: 39725)
`CODESEARCH_MCP_MODE`	MCP mode: auto, client, local
`CODESEARCH_REPOS_CONFIG`	Path to repos.json
`CODESEARCH_REPO_IDLE_TIMEOUT_SECS`	Idle eviction timeout (default: 1800)
`CODESEARCH_CACHE_MAX_MEMORY`	Embedding cache MB (default: 500)
`CODESEARCH_BATCH_SIZE`	Embedding batch size
`CODESEARCH_SCIP_CSHARP`	Override path to `scip-csharp` helper
`RUST_LOG`	Log level (e.g. `codesearch=debug`)

`.codesearchignore`

Place in repo root. Gitignore syntax. Excludes paths from indexing:

# Vendored code
vendor/
node_modules/
# Generated files
*.generated.cs
**/migrations/**

`repos.json`

Located at ~/.codesearch/repos.json. Managed by codesearch index add/rm. Contains repo aliases → paths and group definitions. See Serve Mode.

C# Semantic Search

All C#-specific setup, operation, installation, and testing lives in README_CSharp.md.

If you do not work with C# repos, you can skip it entirely.

Supported Languages

Tree-sitter AST-aware chunking:

Language	Extensions
Rust	`.rs`
Python	`.py`, `.pyw`, `.pyi`
JavaScript	`.js`, `.mjs`, `.cjs`
TypeScript	`.ts`, `.tsx`, `.jsx`, `.mts`, `.cts`
C	`.c`, `.h`
C++	`.cpp`, `.cc`, `.cxx`, `.hpp`, `.hxx`
C#	`.cs`
Go	`.go`
Java	`.java`
Shell	`.sh`, `.bash`, `.zsh`
Ruby	`.rb`, `.rake`
PHP	`.php`
YAML	`.yaml`, `.yml`
JSON	`.json`
Markdown	`.md`, `.markdown`, `.txt`

Markdown uses the tree-sitter-md block grammar — chunks align to sections, headings, and code fences. All other text files use line-based chunking as fallback.

Core Technology

Component	Technology
Embedding	fastembed + ONNX Runtime (CPU)
Vector store	arroy (Approximate Nearest Neighbors) + LMDB
Full-text search	Tantivy (BM25, AND mode)
Chunking	Tree-sitter AST parsing
Incremental sync	SHA-256 content hashing
Caching	3-layer: in-memory (Moka) → persistent disk → query cache
Schema	Versioned via `metadata.json`

Development

# Build
cargo build

# Run tests
cargo test

# Check + lint
cargo clippy --all-targets -- -D warnings

# Format
cargo fmt --all

License

Apache-2.0

Acknowledgements

This project is a fork of demongrep by yxanul. Huge thanks for building such a solid foundation.

Built with: fastembed-rs, arroy, tantivy, tree-sitter, ratatui, LMDB.

Name		Name	Last commit message	Last commit date
Latest commit History 488 Commits
.cargo		.cargo
.claude		.claude
.githooks		.githooks
.github		.github
examples		examples
helpers/csharp		helpers/csharp
scripts		scripts
src		src
tests		tests
.codesearchignore		.codesearchignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.develop.md		AGENTS.develop.md
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
README_CSharp.md		README_CSharp.md
RELEASING.md		RELEASING.md
build.ps1		build.ps1
build.rs		build.rs
build.sh		build.sh
rust-toolchain.toml		rust-toolchain.toml

Folders and files

Latest commit

History

Repository files navigation

codesearch

Why codesearch?

How does this compare?

Architecture

Quick Start

Install

Index a repository

MCP Configuration

Local / Single Repo

Serve / Multi-Repo

MCP Tools Reference

search — Code Search

find — Symbol Navigation

explore — File Exploration

get_chunk — Read Code

find_impact — Symbol Reference Impact

status — Index Info

Serve Mode (Multi-Repo)

Repository Registration

Moved or renamed repositories

Groups

MCP Connection Modes

CLI Reference

Configuration

Environment Variables

.codesearchignore

repos.json

C# Semantic Search

Supported Languages

Core Technology

Development

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 29

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`search` — Code Search

`find` — Symbol Navigation

`explore` — File Exploration

`get_chunk` — Read Code

`find_impact` — Symbol Reference Impact

`status` — Index Info

`.codesearchignore`

`repos.json`

Packages