Cercano

Cercano is a local-first AI development tool that runs open-source models on your own hardware — fast, private, and at zero cost. Currently powered by Ollama, with pluggable backend support planned.

Cercano works in two ways:

1. Local, in-agent Tool — Plug Cercano into cloud-based agents like Claude Code, Cursor, or Copilot via MCP. Instead of sending everything to the cloud, Cercano supercharges your frontier coding experience by providing a set of skills running locally, such as research, summarization, extraction, classification, and code explanation that can not only massively reduce your cloud context window and usage (and costs), but actually provide better context to the cloud model.

2. Standalone Agent — Use Cercano directly as your AI coding assistant. It routes tasks to local models first, falls back to cloud when needed, and runs an agentic loop that generates, validates, and self-corrects code automatically. Integrates with VS Code and other IDEs via gRPC. Please note that Cercano's stand-alone agent is still relatively primitive and is undergoing rapid development.

Key Features

Core

Local-First Architecture — Run powerful open-source models (qwen3-coder, GLM-4.7-Flash, etc.) locally via Ollama.
Cloud Fallback — Seamless integration with Google Gemini and Anthropic Claude for tasks that exceed local model capabilities.
Smart Router — Embedding-based classifier routes requests to local or cloud models. Ultra-fast, no LLM call needed for routing.
Agentic Self-Correction — Iterative loop that generates code, validates it (e.g., via compilation), and self-corrects automatically.
Remote Inference — Point Cercano at a remote Ollama instance (e.g., a Mac Studio on your LAN) for access to larger models. Runtime-configurable with automatic fallback if the remote goes down.
Pluggable Engine Architecture — Inference backends are abstracted behind InferenceEngine and EmbeddingService interfaces. Ollama is the built-in engine; adding new backends (ONNX, vLLM, etc.) requires only implementing the interface and registering it — no changes to the core agent, router, or MCP tools.

Local Co-Processor Tools (via MCP)

When used as a co-processor inside cloud agents, Cercano provides specialized tools that keep work local:

Tool	What it does	Why local?
`cercano_summarize`	Condense files, logs, or text into concise summaries	Keep large content out of cloud context windows
`cercano_extract`	Pull specific info (errors, signatures, config) from large text	Filter noise locally, send only what matters
`cercano_classify`	Triage errors, logs, or code with category + confidence	Quick local triage without cloud round-trip
`cercano_explain`	Explain what code does, its components and data flow	Understand code locally before deciding what to send to cloud
`cercano_local`	General-purpose prompt execution against local models	Offload any simple task to local inference
`cercano_fetch`	Fetch a URL and extract readable text (HTML stripped to plain text)	Read web pages without stuffing raw HTML into cloud context
`cercano_research`	Research a question via DuckDuckGo search + local model analysis	Get distilled, sourced answers without browsing the web yourself
`cercano_document`	Generate doc comments for exported Go symbols and write them to the file	Entire read-think-write cycle stays local — host never sees file contents
`cercano_deep_research`	Multi-source research with ranked findings, reference chasing, and gap analysis	Dozens of page fetches and analyses stay local — host gets only the compiled report

Project Context

Run cercano_init once per project to make all Cercano tools project-aware. It scans the repo, feeds key files through a local model, and writes .cercano/context.md — a concise reference document that gets automatically prepended to all tool calls. The host AI can optionally provide domain knowledge it already has.

If you use a Cercano tool without initializing first, it will suggest running init.

Usage Telemetry & Token Savings

Cercano tracks how much work stays local and how many cloud tokens you save:

cercano_stats — MCP tool that returns usage summary, token savings, and breakdowns by tool, model, and day.
cercano_submit_usage — Opt-in tool for host agents to submit their cloud token usage data, enabling accurate local-vs-cloud comparison. (Usually handled automatically by the PostToolUse hook.)
cercano stats — CLI command for a quick terminal summary of cumulative usage.
Cloud token capture — A PostToolUse hook parses Claude Code's transcript to automatically record cloud token usage alongside local metrics. Run cercano setup to configure the hook.

Data is stored locally in ~/.config/cercano/telemetry.db (SQLite). No prompt content, file paths, or credentials are ever recorded.

Integration

MCP Server — Expose all tools to any MCP-compatible agent (Claude Code, Cursor, Copilot, etc.).
IDE Integration — VS Code extension with gRPC-based architecture. Zed extension in progress.
Model Discovery — Query available models on any Ollama instance via cercano_models.
Runtime Configuration — Switch models, Ollama endpoints, and cloud providers on the fly via cercano_config.

Architecture

Cercano can run as a standalone gRPC server (for IDE clients) or embedded inside an MCP host (for cloud agents like Claude Code). Both modes share the same core engine.

  Standalone Mode                    Co-Processor Mode
  (IDE clients)                      (Cloud agents)

  ┌───────────┐                      ┌──────────────┐
  │  VS Code  │                      │  Claude Code │
  │  Zed, etc │                      │  Cursor, etc │
  └─────┬─────┘                      └──────┬───────┘
        │ gRPC                              │ MCP (stdio)
        │                                   │
┌───────┴───────────────────┐  ┌────────────┴────────────────┐
│    CERCANO SERVER         │  │    CERCANO (embedded)       │
│                           │  │                             │
│  ┌──────────────────────┐ │  │  ┌───────────────────────┐  │
│  │       Agent          │ │  │  │  MCP Tool Handlers    │  │
│  │  ┌───────┐ ┌───────┐ │ │  │  │  summarize, extract,  │  │
│  │  │Router │ │ Loop  │ │ │  │  │  classify, explain    │  │
│  │  └───────┘ └───────┘ │ │  │  └───────────┬───────────┘  │
│  └──────────┬───────────┘ │  │              │              │
│             │             │  │        ┌─────┴──────┐       │
│             │             │  │        │   Agent    │       │
└─────────────┼─────────────┘  │        └─────┬──────┘       │
              │                └──────────────┼──────────────┘
              │                               │
     ┌────────┴────────┐             ┌────────┴────────┐
     │  Engine Layer   │             │  Engine Layer   │
     │  (Ollama, etc.) │             │  (Ollama, etc.) │
     └─────────────────┘             └─────────────────┘

Core Agent (Go) — Handles model routing, agentic loops, conversation history, and provides a gRPC interface.
Smart Router — Uses semantic classification (via embeddings) to route requests. Ultra-fast, no LLM call needed.
Coordinator (LoopAgent) — Google ADK-backed iterative loop that generates code, validates it, and self-corrects with cloud escalation.
Engine Layer — Pluggable inference backends behind InferenceEngine and EmbeddingService interfaces. Ollama is the default; new engines register via EngineRegistry.
MCP Tool Handlers — Specialized prompt templates for summarize, extract, classify, explain, fetch, and research. Each tool wraps the core agent with task-specific prompting.
Conversation Store — Server-side multi-turn history so the LLM can resolve references across requests.

Project Structure

source/server/: The core Go-based AI agent and gRPC server.
source/clients/: IDE-specific extensions.
- vscode/: VS Code extension (TypeScript).
- zed/: Zed extension (Rust).
source/proto/: Protocol Buffer definitions for gRPC.
test/: Integration and sandbox tests.
conductor/: Product definitions, tech stack, and project planning documents.

Tech Stack

Backend - Go (Golang)
Local LLM Runtime - Ollama (qwen3-coder, nomic-embed-text)
Cloud LLMs - Google Gemini, Anthropic Claude
Communication - gRPC
Frontend/Clients - TypeScript (VS Code), Rust (Zed)

Getting Started

Install via Homebrew (macOS)

brew tap bryancostanich/cercano
brew install cercano
cercano setup    # detects/installs Ollama, pulls models, creates config

Install from Source

Requires Go 1.21+.

git clone https://github.com/bryancostanich/Cercano.git
cd Cercano/source/server
make build
bin/cercano setup    # detects/installs Ollama, pulls models, creates config
bin/cercano          # starts the gRPC server

cercano setup handles everything: if no AI engine backend is detected, it offers to install Ollama automatically (via Homebrew on macOS or the official installer on Linux), starts it, and pulls the required models. Use --install-engine to skip the interactive prompt for scripted/CI use.

Use with Claude Code

claude mcp add --transport stdio cercano -- cercano --mcp

Or add to your project's .mcp.json. In --mcp mode, Cercano starts an embedded gRPC server — no separate server needed.

Use with VS Code

Install the VS Code extension dependencies:
```
cd source/clients/vscode && npm install
```
Open source/clients/vscode in VS Code and press F5 to launch.
In the Extension Development Host, open the Chat panel and type @cercano followed by your question.

Developer Workflow

cd source/server
make dev    # build + restart in one command

System config at ~/.config/cercano/config.yaml persists across restarts (Ollama URL, model, port, etc.).

Cloud Provider Setup (Optional)

Cercano is local-first — cloud providers are only used for escalation when local models can't handle a task.

In the Chat panel, type @cercano /config to open the configuration menu.
Set your API key (Google Gemini or Anthropic Claude).
Select your preferred cloud provider for escalation.

Configuration

The following settings are available under cercano.* in VS Code Settings:

Setting	Default	Description
`cercano.localModel`	`qwen3-coder`	Ollama model for local inference (changeable at runtime via `@cercano /config`)
`cercano.server.autoLaunch`	`true`	Automatically start the server on activation
`cercano.server.binaryPath`	(empty)	Override path to the server binary
`cercano.server.port`	`50052`	gRPC server port
`cercano.ollama.url`	`http://localhost:11434`	Ollama server URL
`cercano.provider`	`local`	Cloud provider for escalation (`google` or `anthropic`)
`cercano.model`	(empty)	Override cloud model name

MCP Server

Cercano can be used as an MCP (Model Context Protocol) server, allowing cloud-based agents like Claude Code and Cursor to delegate work to local models — faster, private, and at zero cost.

Setup

Build Cercano:
```
cd source/server
make build
```

Add to Claude Code (choose one):

Via CLI:

claude mcp add --transport stdio cercano -- /path/to/Cercano/source/server/bin/cercano --mcp

Via .mcp.json (project scope):

{
  "mcpServers": {
    "cercano": {
      "type": "stdio",
      "command": "/path/to/Cercano/source/server/bin/cercano",
      "args": ["--mcp"]
    }
  }
}

In --mcp mode, Cercano starts an embedded gRPC server automatically — no separate server process needed.

MCP Tools

See the tool table in Key Features above for the full list. Additional utility tools:

Tool	Description
`cercano_models`	List models available on the active Ollama instance. Useful for discovering models on a remote machine.
`cercano_config`	Switch models, Ollama endpoints, or cloud providers at runtime without restarting.

Usage Examples

Once the MCP server is connected, your agent can call Cercano tools directly:

Chat query (offload to local model):

cercano_local(prompt: "What is a goroutine in Go? Answer in one sentence.")
→ "A goroutine is a lightweight thread of execution managed by the Go runtime."
  [Model: qwen3-coder, Confidence: 1.00, Escalated: false]

Switch local model at runtime:

cercano_config(action: "set", local_model: "GLM-4.7-Flash")
→ Configuration update success: updated: [local_model=GLM-4.7-Flash]

Agentic code generation (with validation loop):

cercano_local(
  prompt: "Add a health check endpoint that returns JSON",
  file_path: "internal/server/health.go",
  work_dir: "/path/to/project/source/server"
)
→ Generated code with automatic build validation and self-correction.

Point at a remote Ollama instance:

cercano_config(action: "set", ollama_url: "http://mac-studio.local:11434")
→ Configuration update success: updated: [ollama_url=http://mac-studio.local:11434]

Discover available models:

cercano_models()
→ Available models (2):
  - qwen3-coder:latest (4.7 GB)
  - llama3:70b (39.1 GB)

Summarize a file locally (keep large content out of cloud context):

cercano_summarize(file_path: "internal/agent/router.go", max_length: "brief")
→ "This Go package implements a smart routing system that selects between local
   and cloud AI models based on semantic similarity of user requests."

Extract specific info from large text:

cercano_extract(text: "<500 lines of logs>", query: "error and warning messages")
→ WARN  Remote endpoint health check failed (attempt 1/3)
  ERROR Remote endpoint unreachable after 3 attempts, falling back to local

Classify/triage an error locally:

cercano_classify(
  text: "panic: runtime error: invalid memory address or nil pointer dereference",
  categories: "bug, config issue, infra problem"
)
→ Category: bug
  Confidence: high
  Reasoning: Nil pointer dereference is a programming bug in the code logic.

Explain unfamiliar code before deciding what to send to cloud:

cercano_explain(file_path: "internal/agent/router.go")
→ This code implements a smart routing system for an AI agent that selects
  between local and cloud models based on semantic similarity...

Research a question (search + fetch + local model analysis):

cercano_research(query: "How does the Ollama REST API list models?")
→ Ollama lists models via GET /api/tags, which returns a JSON array of
  installed models with name, size, and modification date...

  Sources:
  - https://docs.ollama.com/api/introduction
  - https://github.com/ollama/ollama/blob/main/docs/api.md

Multi-turn conversation:

cercano_local(prompt: "Explain the SmartRouter", conversation_id: "abc123")
cercano_local(prompt: "How does it handle escalation?", conversation_id: "abc123")
→ Second call has full context from the first.

Verified Agents

Agent	Status
Claude Code	Verified — tool discovery, chat queries, config updates, model switching
Cursor	Not yet tested

Flags

Flag	Default	Description
`--grpc-addr`	`localhost:50052`	Address of the Cercano gRPC server

Agent Skills

Cercano publishes its tools as Agent Skills — an open standard for packaging AI capabilities so they're discoverable by any compatible agent. Over 30 agents support this standard, including Claude Code, Cursor, Copilot, Gemini CLI, Codex, and more.

Published Skills

Skill	Description
`cercano-local`	General-purpose local inference — chat queries and agentic code generation
`cercano-summarize`	Summarize text or files locally (brief, medium, or detailed)
`cercano-extract`	Pull specific information from text (errors, signatures, config values)
`cercano-classify`	Categorize/triage text with confidence scores and reasoning
`cercano-explain`	Explain code — what it does, key interfaces, and data flow
`cercano-fetch`	Fetch a URL and extract readable text (HTML stripped to plain text)
`cercano-research`	Research a question via DuckDuckGo search + local model analysis
`cercano-config`	View/change Cercano's runtime configuration
`cercano-models`	List available models on the connected Ollama instance
`cercano-init`	Initialize project context for project-aware responses
`cercano-document`	Generate doc comments for exported Go symbols and write directly to the file
`cercano-deep-research`	Multi-source research with ranked findings, reference chasing, and gap analysis
`cercano-stats`	View usage statistics and cloud token savings

Each skill is a SKILL.md file that tells the agent what the tool does, its parameters, and how to invoke it via MCP.

How Agents Discover Skills

Agents scan well-known directories for SKILL.md files at startup:

Directory	Discovered by
`.agents/skills/<skill-name>/SKILL.md`	Any Agent Skills-compatible agent
`.claude/skills/<skill-name>/SKILL.md`	Claude Code (also appears as slash commands)

Cercano ships its skill definitions in both locations.

Installing Skills in Your Project

To make Cercano's skills available to your agent, copy the skill files into your project:

# For any Agent Skills-compatible agent
cp -r /path/to/Cercano/.agents/skills/* .agents/skills/

# For Claude Code specifically (enables /cercano-* slash commands)
cp -r /path/to/Cercano/.claude/skills/* .claude/skills/

The cercano_skills MCP tool also provides programmatic access to skill definitions:

cercano_skills(action: "list")           → catalog of all skills
cercano_skills(action: "get", name: "cercano-local")  → full SKILL.md content

For a detailed guide on writing custom SKILL.md files, see docs/agent-skills-guide.md.

Remote Inference

Cercano can delegate inference to a remote Ollama instance — for example, another machine on your LAN with more GPU memory and larger models. The remote endpoint is runtime-configurable with automatic fallback to local Ollama if the remote goes down.

Setup

Ensure Ollama is running on the remote machine and accessible over the network:

# On the remote machine (e.g., mac-studio.local)
OLLAMA_HOST=0.0.0.0 ollama serve

Point Cercano at the remote instance:

Via environment variable (at startup):

OLLAMA_URL=http://mac-studio.local:11434 bin/agent

Via MCP at runtime (no restart needed):

cercano_config(action: "set", ollama_url: "http://mac-studio.local:11434")

Discover available models on the remote machine:

cercano_models()
→ Available models (3):
- qwen3-coder:latest (4.7 GB, modified: 2026-03-15T10:30:00Z)
- llama3:70b (39.1 GB, modified: 2026-03-14T09:00:00Z)
- deepseek-coder-v2:latest (8.9 GB, modified: 2026-03-13T14:00:00Z)

Switch to a model that's only available on the remote:

cercano_config(action: "set", local_model: "llama3:70b")

Fallback Behavior

When a remote endpoint is configured, Cercano monitors it with periodic health checks:

Pings the remote every 30 seconds via GET /api/tags
After 3 consecutive failures, automatically switches to local Ollama
When the remote recovers, automatically switches back
Response metadata includes [Endpoint: url] or [Endpoint: url (fallback)] so you always know which instance served the request

No configuration is needed — fallback is automatic whenever a remote URL is set.

Development

Cercano is in active development. For detailed information on the project's goals and technical decisions, refer to the documents in the conductor/ directory.

Building

cd source/server
make all    # Build both agent and MCP server
make test   # Run all tests

Feature TODOs

New Features

Competitive Audit — Agent Features Landscape - Feature matrix across 12+ open-source and commercial agents (Codex, Aider, Continue, Cody, OpenHands, SWE-Agent, Claude Code, Cursor, Windsurf, GitHub Copilot, JetBrains AI, Amazon Q) to inform Cercano's tool design and roadmap.
Semantic Codebase Search - Embedding-based code search by intent ("find auth-related code"), not just string matching. Requires indexing pipeline, storage, and nearest-neighbor retrieval.
User-Friendly Distribution - Setup/launch scripts, Docker containerization, and CI/CD pipeline with GitHub Actions for automated cross-platform releases.
AI Engine Agnosticism - Abstract the local inference layer to support pluggable backends (ONNX Runtime, Enso, etc.) beyond Ollama.
Web Research Tool - Fetch URLs, search the web via DuckDuckGo, and use local models to analyze and distill results. Keeps raw web content out of the cloud context window.
Stand-alone CLI - Create a stand alone Command Line Interface (CLI) for cercano that doesn't really on other CLI integrations.
PDF Parsing - Extract text from local and remote PDFs for use with summarize, extract, explain, and research tools.
Documentation Site Indexing - Crawl a documentation site once, index it persistently, and make it searchable across sessions (similar to Cursor's @Docs).

Existing Improvements

Better VS Code Agent Window Integration - Make Cercano available as a model dropdown in the VS Code agent window alongside Gemini, Claude, etc.
LLM-Based Conversation Compaction - Replace simple truncation-based compaction with LLM-powered summarization for better context retention in long conversations.
Per-Model Configuration - Configurable per-model settings (context window, classification thresholds, history depth, compaction limits) instead of hardcoded constants.
Simplify Provider Routing - Evaluate removing the SmartRouter's embedding-based local/cloud routing in favor of always-local with coordinator-driven cloud escalation.
Zed Extension - Build out the Rust-based Zed extension (source/clients/zed/) with feature parity to the VS Code extension.

This is not an officially supported Google product

Canonical Repo :: https://github.com/bryancostanich/Cercano

Google Mirror :: https://github.com/GoogleDevRelExplorations/cercano

Name		Name	Last commit message	Last commit date
Latest commit History 427 Commits
.agents/skills		.agents/skills
.claude/skills		.claude/skills
.github/workflows		.github/workflows
bin		bin
conductor		conductor
docs		docs
plugins/skills		plugins/skills
source		source
test/sandbox		test/sandbox
.gitignore		.gitignore
.mcp.json		.mcp.json
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
agent.log		agent.log
output.md		output.md

Folders and files

Latest commit

History

Repository files navigation

Cercano

Key Features

Core

Local Co-Processor Tools (via MCP)

Project Context

Usage Telemetry & Token Savings

Integration

Architecture

Project Structure

Tech Stack

Getting Started

Install via Homebrew (macOS)

Install from Source

Use with Claude Code

Use with VS Code

Developer Workflow

Cloud Provider Setup (Optional)

Configuration

MCP Server

Setup

MCP Tools

Usage Examples

Verified Agents

Flags

Agent Skills

Published Skills

How Agents Discover Skills

Installing Skills in Your Project

Remote Inference

Setup

Fallback Behavior

Development

Building

Feature TODOs

New Features

Existing Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages