Local semantic search for AI agents and humans.
Quarry indexes documents in 20+ formats, embeds them with a local ONNX model (snowflake-arctic-embed-m-v1.5, 768-dim), stores vectors in LanceDB, and serves semantic search to Claude Code, Claude Desktop, and the CLI. Everything runs locally — no API keys, no cloud accounts. The embedding model (~120 MB int8) downloads once on first use. CUDA GPUs are auto-detected for faster inference.
Platforms: macOS, Linux
curl -fsSL https://raw.githubusercontent.com/punt-labs/quarry/1961675/install.sh | shRestart Claude Code, then:
> /ingest report.pdf # index a document (runs in background)
> /quarry status # after a moment, confirm it's there
> /find "what does the report say about margins" # search by meaning
Once installed, a plugin hook auto-indexes your current project directory on every session start — you don't need to /ingest your codebase manually.
Manual install (if you already have uv)
uv tool install punt-quarry
quarry install
quarry doctorVerify before running
curl -fsSL https://raw.githubusercontent.com/punt-labs/quarry/1961675/install.sh -o install.sh
shasum -a 256 install.sh
cat install.sh
sh install.shRun quarry on a GPU server and connect from any Mac or Linux client over TLS.
Server (GPU host, serves remote clients):
export QUARRY_API_KEY=$(openssl rand -hex 32)
curl -fsSL https://raw.githubusercontent.com/punt-labs/quarry/1961675/install.sh | sh -s -- --networkGenerates TLS certificates, binds daemon to 0.0.0.0, registers a systemd service, and prints a CA fingerprint. NVIDIA GPUs are auto-detected for CUDA inference.
Client (connects to remote server):
curl -fsSL https://raw.githubusercontent.com/punt-labs/quarry/1961675/install.sh | sh
quarry login <server-hostname> --api-key <token>No special flag needed --- the default install runs a local daemon on localhost. quarry login redirects queries to the remote server over wss:// with TOFU certificate pinning.
Download punt-quarry.mcpb and double-click to install. Alternatively, quarry install configures Claude Desktop automatically.
Note: Uploaded files in Claude Desktop live in a sandbox that quarry cannot access. Use remember for uploaded content, or provide local file paths to ingest.
- 20+ formats --- PDFs (with OCR for scanned pages), source code (AST-aware splitting), spreadsheets, presentations, HTML, Markdown, LaTeX, DOCX, images
- Semantic search --- retrieval is by meaning, not keyword. A query about "margins" finds passages about profitability even if they never use that word
- Daemon architecture --- one
quarry serveprocess loads the embedding model once and serves all Claude Code sessions via mcp-proxy over WebSocket - Passive knowledge capture --- SessionStart hook auto-indexes the working directory, PostToolUse hook auto-ingests fetched URLs, PreCompact hook captures transcripts before context compaction
- Named databases --- isolated LanceDB directories with independent sync registries. Switch with
usefor work/personal separation - Research agent ---
researchersubagent combines quarry local search with web research, auto-ingests valuable findings
> /ingest report.pdf
▶ Ingesting report.pdf (background)
> /quarry
▶ Database: default
Documents: 47
Chunks: 1,203
Size: 12.4 MB
Model: snowflake-arctic-embed-m-v1.5 (768-dim)
> /find "what were the Q3 revenue figures"
▶ [report.pdf p.12 | text/.pdf] (similarity: 0.4521)
Third quarter revenue reached $142M, up 18% year-over-year,
driven primarily by expansion in the enterprise segment.
Gross margins improved to 71% from 68% in Q2.
| Command | What it does |
|---|---|
/ingest <source> |
Ingest a URL, directory, or file |
/remember <name> |
Ingest inline text under a document name |
/find <query> |
Semantic search. Questions get synthesized answers; keywords get raw results |
/explain <topic> |
Search and synthesize an explanation |
/source <claim> |
Find which document a claim comes from |
/quarry [sub] |
Manage: status, sync, collections, databases, registrations |
| Tool | Purpose | Execution |
|---|---|---|
ingest |
Index a file or URL | Background |
remember |
Index inline text | Background |
register_directory |
Register directory for sync | Background |
sync_all_registrations |
Re-index all registered directories | Background |
find |
Semantic search with filters | Sync |
show |
Document metadata or page text | Sync |
list |
Documents, collections, databases, registrations | Sync |
status |
Database statistics | Sync |
delete |
Remove document or collection | Background |
deregister_directory |
Remove registration | Background |
use |
Switch active database | Sync |
quarry ingest report.pdf # index a file
quarry ingest https://example.com # index a webpage
echo "notes" | quarry remember --name notes.md # index inline text
quarry find "revenue trends" # hybrid search (vector + FTS)
quarry list documents # list indexed documents
quarry register ~/Documents/notes # watch a directory
quarry sync # re-index registered dirs
quarry use work # switch database
quarry status # database dashboard
quarry doctor # health check
quarry serve # start daemon on :8420
quarry install # set up daemon, TLS certs, mcp-proxy
# Remote connections
quarry login okinos.local --api-key <token> # TOFU login to remote server
quarry logout # disconnect, revert to local daemon
quarry remote list --ping # show remote config and health
# Agent memory tagging
quarry ingest notes.md --agent-handle claude --memory-type fact
quarry find "deployment steps" --agent-handle claude
echo "key insight" | quarry remember --name insight.md --agent-handle claude \
--memory-type observation --summary "Key insight from review"Quarry works with zero configuration. These environment variables are available for customization:
| Variable | Default | Description |
|---|---|---|
QUARRY_PROVIDER |
(auto) | ONNX execution provider: cpu, cuda, or unset (auto-detect) |
QUARRY_API_KEY |
(none) | Bearer token for quarry serve |
QUARRY_ROOT |
~/.punt-labs/quarry/data |
Base directory for all databases |
CHUNK_MAX_CHARS |
1800 |
Max characters per chunk (~450 tokens) |
CHUNK_OVERLAP_CHARS |
200 |
Overlap between consecutive chunks |
For the full configuration reference, see Architecture section 7.
Beyond explicit /ingest and /find commands, quarry runs as a Claude Code plugin with hooks that capture knowledge automatically during your sessions:
| Hook | When it fires | What it does |
|---|---|---|
| Session start | On every session start | Auto-registers your project directory and syncs it in the background. Your codebase is searchable without manual ingestion. |
| Web fetch | After any WebFetch tool call |
URLs Claude fetches during research are auto-ingested into a web-captures collection. Reuses already-retrieved content when available, falls back to URL ingest otherwise. |
| Pre-compact | Before context compaction | Captures the conversation transcript into a session-notes collection. Discoveries that would be lost when the context window shrinks are preserved as searchable chunks. |
All hooks are fail-open — failures are ignored and never block Claude Code. Each hook is individually toggleable via .punt-labs/quarry/config.md YAML frontmatter. See AGENTS.md for the full integration model.
Quarry runs as a daemon. Claude Code sessions connect through mcp-proxy:
stdio wss:// (TLS)
Claude Code <-----------------> mcp-proxy <---------------------> quarry serve
MCP JSON-RPC (~5 MB Go) pinned CA cert (one daemon)
Without the proxy, every session spawns a separate Python process, each loading the embedding model into ~200 MB of RAM. With it, startup is instant and state is shared across all sessions. All connections use TLS with a self-signed CA — even on localhost.
quarry install downloads mcp-proxy (SHA256-verified, correct platform) and configures MCP clients.
Architecture | Z Specification | Design | Agents | Changelog
uv sync # install dependencies
make check # run all quality gates (lint, type, test)
make test # test suite only
make format # auto-format code
make docs # build LaTeX documentsMIT