Local HTTP proxy + dashboard for AI agent developers.
One command intercepts every LLM API call, saves a full snapshot locally, and opens a "Control Room" dashboard — no cloud, no config, no data leaves your machine.
If this is useful, a star on GitHub goes a long way — it helps other agent developers find it.
When you're building AI agents, you need to answer questions like:
- Which LLM call caused the bad output?
- What was the exact context when the agent went off-track?
- Can I replay this run with a different response at step 3?
Kontex intercepts every call at the proxy layer, so you get full observability with zero changes to your agent code — just point your base URL at localhost:8080.
Your agent → localhost:8080 → OpenAI / Anthropic / Ollama / any LLM API
│
├── Saves raw prompt + response to .kontex.db (SQLite)
├── Optionally trims context (lossless, toggleable)
└── Serves dashboard at GET /
| Feature | Description |
|---|---|
| Proxy | Intercepts every POST /* call and forwards to your upstream LLM |
| Snapshots | Saves the full untrimmed prompt and response to SQLite — nothing is lost |
| Context trimmer | Structurally lossless trimming applied before the upstream call — toggleable from the dashboard |
| Session grouping | Groups related agent runs into sessions via a request header |
| Multi-agent graph | Swim-lane view showing every agent's trajectory and cross-agent links |
| Live pause | Pause a request mid-flight, inspect it, then resume with edited messages |
| Fork & replay | Branch from any snapshot with a human-edited response; downstream calls replay deterministically |
| Branch chain | Create a new agent task from any snapshot, staying in the same session |
- Node.js 18+
- npm 9+
npm install -g kontex-proxy
kontex startgit clone https://github.com/pankaj-agrawalla/kontex-cli.git
cd kontex-cli
npm install
cd web && npm install && cd ..
npm run buildCopy .env.example and edit as needed:
cp .env.example .env# .env
KONTEX_PORT=8080 # Port for the proxy + dashboard (default: 8080)
UPSTREAM_URL=https://api.openai.com # LLM API to forward requests toTo use with Ollama locally:
UPSTREAM_URL=http://localhost:11434To use with Anthropic:
UPSTREAM_URL=https://api.anthropic.comkontex startThe browser opens automatically at http://localhost:8080.
Or with a custom port:
kontex start --port 9000Change your agent's base URL from the LLM provider to the Kontex proxy:
http://localhost:8080
No other code changes are required. All requests are transparently proxied.
Example — OpenAI SDK:
import OpenAI from "openai"
const client = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
baseURL: "http://localhost:8080/v1", // ← point at Kontex
})Example — LangChain:
import { ChatOpenAI } from "@langchain/openai"
const llm = new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
configuration: {
baseURL: "http://localhost:8080/v1", // ← point at Kontex
},
})Example — raw fetch:
await fetch("http://localhost:8080/v1/chat/completions", {
method: "POST",
headers: { "Content-Type": "application/json", "Authorization": `Bearer ${apiKey}` },
body: JSON.stringify({ model: "gpt-4o", messages }),
})These headers unlock richer dashboard views. They are stripped before forwarding upstream — your LLM never sees them.
| Header | Purpose |
|---|---|
X-Kontex-Task-Id |
Groups snapshots into a named agent task (swim lane in the graph). Defaults to "default" if omitted. |
X-Kontex-Session-Id |
Groups all tasks from one run into a single session entry in the sidebar. |
X-Kontex-Parent-Task-Id |
Records a cross-agent link (draws an amber dashed edge). Send on the first turn only of a child agent. |
X-Kontex-Fork-Id |
Enables deterministic replay. Set to the task ID you forked from. |
Without any headers, everything still works — all snapshots land under the "default" task and appear in the dashboard.
With headers (recommended for multi-agent workflows):
const headers = {
"X-Kontex-Task-Id": "planner-agent",
"X-Kontex-Session-Id": "run-2024-001",
// first turn of a child agent only:
"X-Kontex-Parent-Task-Id": "planner-agent",
}Open http://localhost:8080 in your browser.
- Lists all sessions ordered newest-first
- Each entry shows the session ID, timestamp, agent count, and snapshot count
- Click a session to load its graph
- Context trimmer toggle at the bottom — turn trimming on or off in real time
- One swim-lane column per agent task
- Nodes = individual LLM calls (snapshots)
- Gray edges = within the same agent
- Amber dashed animated edges = cross-agent links (parent → child)
- Amber-bordered nodes = human-edited snapshots
- Click any node to open the snapshot drawer
Opens when you click a node. Shows:
- The full conversation messages sent to the LLM
- Live Pause — pauses the next request from this task mid-flight so you can inspect and edit messages before they reach the LLM
- Fork & Edit — save a human-edited version of the messages; the next replay of this prompt hash will return your edited version instead of calling the LLM
- Branch chain here — create a new agent task (in the same session) branching from this point, with an editable LLM response
Found this useful in your stack? Share it with your team or post it in your AI/agent dev community — this project grows entirely through word of mouth.
The trimmer applies three structurally lossless passes before forwarding to the upstream LLM:
- Tool result truncation — long tool/function responses are sliced to prevent runaway context growth
- Middle-turn compression — older assistant turns in the middle of a long conversation are shortened
- System prompt deduplication — repeated system content across turns is reduced
The raw untrimmed payload is always saved to the database — trimming only affects what is forwarded upstream.
Toggle it on/off live from the sidebar without restarting the server.
const SESSION_ID = `run-${Date.now()}`
// Agent 1 — Planner
const plannerResponse = await fetch("http://localhost:8080/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${apiKey}`,
"X-Kontex-Task-Id": "planner",
"X-Kontex-Session-Id": SESSION_ID,
},
body: JSON.stringify({ model: "gpt-4o", messages: plannerMessages }),
})
// Agent 2 — Coder (links back to planner)
const coderResponse = await fetch("http://localhost:8080/v1/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": `Bearer ${apiKey}`,
"X-Kontex-Task-Id": "coder",
"X-Kontex-Session-Id": SESSION_ID,
"X-Kontex-Parent-Task-Id": "planner", // ← first turn only
},
body: JSON.stringify({ model: "gpt-4o", messages: coderMessages }),
})This produces a dashboard with two swim lanes and an amber edge from Planner → Coder, grouped under one session.
All data is stored in .kontex.db (SQLite) in the project root. The file is created automatically on first run.
To start completely fresh:
rm .kontex.db
kontex startCREATE TABLE Snapshots (
id TEXT PRIMARY KEY, -- cuid
task_id TEXT NOT NULL, -- from X-Kontex-Task-Id header
parent_id TEXT, -- previous snapshot in the same task
parent_task_id TEXT, -- from X-Kontex-Parent-Task-Id header
session_id TEXT, -- from X-Kontex-Session-Id header
prompt_hash TEXT NOT NULL, -- MD5 of messages array (for replay lookup)
raw_prompt_payload TEXT NOT NULL, -- original untrimmed JSON body
llm_response TEXT, -- raw response from upstream
is_human_edited INTEGER DEFAULT 0, -- 1 if created via fork
created_at INTEGER NOT NULL -- Unix ms
);These endpoints power the dashboard. You can also call them directly.
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
GET |
/api/sessions |
List all sessions |
GET |
/api/tasks |
List all task IDs |
GET |
/api/graph?session=<id> |
Combined graph (nodes + edges) for a session |
GET |
/api/tasks/:id/graph |
Graph for a single task |
GET |
/api/snapshots/:id |
Full snapshot detail |
POST |
/api/snapshots/:id/pause |
Pause the next request on this snapshot |
POST |
/api/snapshots/:id/resolve |
Resume a paused request with edited messages |
POST |
/api/snapshots/:id/fork |
Create a human-edited snapshot (same task) |
POST |
/api/snapshots/:id/fork-chain |
Create a new task branching from this snapshot |
GET |
/api/trimmer |
Get trimmer state { enabled: boolean } |
POST |
/api/trimmer/toggle |
Toggle trimmer on/off |
Run the backend and frontend separately with hot reload:
# Terminal 1 — backend
npm run dev
# Terminal 2 — frontend
cd web && npm run devThe Vite dev server runs on port 5173 and proxies /api to localhost:8080.
Requires Ollama running locally with llama3.2:1b:
ollama pull llama3.2:1b
npm run build
npm run e2eSimulates a 3-agent pipeline (Planner → Coder → Reviewer), verifies snapshots, cross-agent edges, session grouping, fork/replay, and edge cases. Exits 0 on full pass.
We're building something bigger around Kontex CLI — team dashboards, session sharing, and deeper agent observability are on the roadmap.
- Watch this repo (GitHub Watch) to get notified on releases
- Star it (GitHub Star) to show support and help others discover it
- Open an issue to share what you're building — it directly shapes what gets built next
Issues and PRs are welcome. Please open an issue first for significant changes.
MIT — see LICENSE.
