SecureContext — Secure Multi-Agent Harness for Claude Code

Persistent memory, verifiable telemetry, and work-stealing coordination for multi-agent Claude Code sessions. Built on the principle: cybersecurity into the architecture, not bolted on. HMAC-chained audit trail, per-agent cryptographic identity, Postgres Row-Level Security, atomic work distribution, closed learning loop. Zero cloud sync. MIT license.

What SecureContext Is Today

SecureContext started as a token-optimization memory plugin. Through 17 sprints of design + red-team verification it evolved into something larger:

A hardened harness for running multi-agent Claude Code sessions where multiple agents (Opus orchestrator + Sonnet worker pool) coordinate through a verifiable audit trail, share memory across sessions, distribute work atomically through a Postgres queue, and feed failures back into a learning corpus — all while staying within the Claude Code TUI (so Claude Pro auth keeps working — no API-key upgrade required).

Four pillars:

Pillar	What it means
Persistent memory	Working memory facts, session summaries, KB search — survives Claude Code restarts. MemGPT-style importance scoring; hybrid BM25 + vector retrieval.
Verifiable security	HMAC hash chain over every tool call + outcome. Per-agent HKDF subkeys — agent A cryptographically cannot forge a row claiming to be agent B. Postgres RLS with per-query `SET LOCAL ROLE`. Credential-isolated sandbox for `zc_execute`.
Multi-agent coordination	Broadcast channel for ASSIGN / STATUS / MERGE. Postgres work-stealing queue (`FOR UPDATE SKIP LOCKED`) for atomic task distribution across worker pools. Dynamic role spawn via LAUNCH_ROLE. Dispatcher nudge + Stop-hook enforcement to prevent worker drift.
Closed learning loop	Per-tool-call telemetry with real cost accounting. Three outcome resolvers (git_commit, user_prompt sentiment, follow-up pattern). Outcomes auto-feed `learnings/failures.jsonl` + `learnings/experiments.jsonl` — no agent discipline required.

Headline Numbers

Metric	Result
Token overhead per session (vs. native Claude re-paste)	~87% lower
Claude Opus cost per session (tool-call overhead only)	~$0.16 vs. ~$2–5 native
Recall cache hit saves	~800 tokens per call (~$0.06 on Opus)
Unit + integration tests	645 passing
Red-team attack IDs verified	60+ (RT-S0 through RT-S4)
Hash-chain forgery resistance	Cryptographic (per-agent HKDF subkey)
Agents per role (work-stealing pool)	1 to 20 (tested 50 × 100 no double-claim)

Key Capabilities

1. Persistent Memory That Survives Restarts

Claude's context window is lossy. When it compacts, architecture decisions, file locations, and task state vanish. SecureContext persists them.

Working memory: zc_remember("api_key_rotation_decided", "use KMS", importance=5). Bounded to 100–250 facts (auto-scales with project complexity). Lowest-importance facts evict to archival KB rather than disappearing.
Session summaries: zc_summarize_session() archives a structured summary for 365 days. Retrievable via zc_search(["prior session"]).
Shared broadcast channel: multi-agent A2A coordination (ASSIGN, STATUS, MERGE, DEPENDENCY, PROPOSED, REJECT, REVISE, LAUNCH_ROLE, RETIRE_ROLE).
Hybrid KB search: FTS5 BM25 + Ollama vector reranking. Falls back cleanly to BM25-only if Ollama is unavailable.
Cross-project search: zc_search_global federates across all local project KBs.

2. Security That Auditors Can Verify

HMAC-chained rows on tool_calls_pg + outcomes_pg. Tampering with any row breaks the chain; verifyChain() detects it deterministically.
Per-agent HKDF subkeys: each agent signs its rows with a key derived from the shared machine secret + agent_id. No other agent can produce a valid signature. RT-S2-01 proves it with a live forgery attempt.
Postgres RLS (T3.2): 4 policies on outcomes_pg for the classification tiers public/internal/confidential/restricted. restricted rows visible only to the writing agent via current_setting('zc.current_agent', true).
Per-query SET LOCAL ROLE (T3.1): every write transaction switches to a per-agent Postgres role so a compromised agent can't escalate its database identity.
Credential-isolated sandbox for zc_execute: PATH-only environment, 30s timeout, 512 KB output cap, no ANTHROPIC_API_KEY / AWS / GitHub tokens leak through.
Secret scanner on 11 patterns + high-entropy detection, runs before any external send.
Audit log — append-only HMAC-chained — survives even catastrophic context compaction.

3. Multi-Agent Coordination (Production-Grade)

Via the companion A2A_dispatcher:

Worker pools with -WorkerCount N: start-agents.ps1 -Roles developer -WorkerCount 3 spawns developer-1/2/3, all sharing role="developer" and one work-stealing queue.
Atomic work distribution: Postgres FOR UPDATE SKIP LOCKED guarantees each queued task is claimed exactly once. Unit-verified with 50 concurrent workers racing for 100 tasks (RT-S4-01 — zero double-claims).
File-ownership overlap guard: /api/v1/broadcast rejects ASSIGN with HTTP 409 Conflict if file_ownership_exclusive overlaps an in-flight task's set. Two workers can never be given the same file.
Dynamic role spawn: orchestrator broadcasts LAUNCH_ROLE state=qa → dispatcher spawns a QA agent mid-session. Matching RETIRE_ROLE cleans it up.
Dispatcher wake-nudge: polls the queue every 15s; if a role has queued tasks and alive workers aren't claiming, sends them a direct "call zc_claim_task now" message.
Stop-hook enforcement: blocks a worker from ending its session if the queue still has claimable tasks for its role (forces drain-before-summarize).
Role-tagged registration: agents.json._agent_roles sidecar maps agent_id → role so the dispatcher can route by pool.

4. Honest Cost Accounting

Every MCP tool call produces a row with input_tokens, output_tokens, model, latency, status, and cost_usd. v0.17.2 corrections:

Tier 1 — computeToolCallCost prices from the LLM's perspective: tool call args at output rate (LLM generated), tool response at input rate (LLM ingests next turn). Naive accounting over-reported Opus recall cost by 5×.
Tier 2 — DB-assembly tools (zc_recall_context, zc_file_summary, zc_project_card, zc_status) show $0 cost so the orchestrator's delegate-vs-DIY decision isn't polluted by infra noise.
Opus orchestrator makes real cost trade-offs: "should I read this file myself (Opus input rate) or delegate to a developer (Sonnet) via ASSIGN broadcast overhead?" With honest numbers, the trade is decidable.

5. Closed Learning Loop (v0.17.2 L4)

When recordOutcome({outcomeKind: "rejected" | "failed" | "insufficient" | "errored" | "reverted"}) lands, it also atomically appends a structured JSON line to <project>/learnings/failures.jsonl. High-confidence shipped / accepted outcomes append to experiments.jsonl. No agent discipline required.

Future sessions surface those learnings via zc_search(["past failures for X"]) — the loop is now structural, not behavioral.

6. Architectural Quality Gates

Three automated checks prevent whole classes of regression:

L1 — env-pinning linter (npm run check:env): scans src/ for process.env.ZC_* refs, asserts every CRITICAL var (like ZC_AGENT_ID) is explicitly pinned in the dispatcher's launcher templates. Would have caught the pre-v0.17.0 bug where every agent's MCP server inherited the last-written agent_id (breaking per-agent HKDF isolation). 14-case self-test.
L3 — no-floating-promises ESLint (npm run lint): @typescript-eslint/no-floating-promises caught 3 real violations on install. Equivalent bug silently dropped 9 months of outcome writes when outcomes.ts became async in v0.12.0. 5-case regression self-test.
L4 — outcome auto-feedback (see above): the learning loop itself is enforced in code, not by convention.

Architecture at a Glance

                     ┌──────────────────┐
                     │ Claude Code TUI  │
                     │ (Opus + Sonnet)  │
                     └────────┬─────────┘
                              │ MCP (stdio)
               ┌──────────────┴──────────────┐
               │    SecureContext server     │
               │    (src/server.ts)          │
               └──┬──┬──┬──┬──────────────┬──┘
                  │  │  │  │              │
                  │  │  │  │              ▼
                  │  │  │  │       zc_execute (sandbox)
                  │  │  │  │       zc_fetch   (SSRF-guarded)
                  │  │  │  ▼
                  │  │  │  zc_enqueue_task / zc_claim_task (SKIP LOCKED)
                  │  │  ▼
                  │  │  zc_broadcast (HMAC-chained, ownership-guarded)
                  │  ▼
                  │  zc_recall_context (60s TTL cache; per-agent scoped)
                  ▼
                  zc_remember / zc_search (working memory + KB)

         All persisted to one of:
         • ~/.claude/zc-ctx/sessions/{projectHash}.db  (SQLite, local)
         • sc-postgres                                 (Docker, PG + pgvector)

Complete architecture: ARCHITECTURE.md. Threat model: docs/THREAT_MODEL.md. Harness usage rules: AGENT_HARNESS.md.

MCP Tools (25)

Memory + Retrieval (7)

Tool	What it does
`zc_remember`	Store a fact with importance score (1–5) and optional agent namespace
`zc_forget`	Remove a fact
`zc_recall_context`	Restore working memory + broadcasts + session events (60s cache with change-detection)
`zc_summarize_session`	Archive session summary to 365-day KB
`zc_search`	Hybrid BM25 + vector search in current project
`zc_search_global`	Federated search across all local project KBs
`zc_status`	DB health + KB counts + working-memory fill + fetch budget

Indexing + Knowledge (5)

Tool	What it does
`zc_index`	Manually index text into the KB
`zc_fetch`	Fetch a URL (SSRF-checked) → Markdown → indexed as `[EXTERNAL]`
`zc_index_project`	Bulk-index project tree with semantic L0/L1 via local Ollama
`zc_file_summary`	L0/L1 summary accessor (replaces Read for check/review questions)
`zc_project_card`	Per-project orientation card (stack, state, gotchas) — read or update

Execution + Analysis (5)

Tool	What it does
`zc_execute`	Run Python/JS/Bash in credential-isolated sandbox
`zc_execute_file`	Analyse a specific file in sandbox (stdin-passed TARGET_FILE)
`zc_batch`	Run shell commands AND search KB in one parallel call
`zc_check`	Memory-first answer with confidence scoring (high/medium/low/none)
`zc_capture_output`	Archive long bash output to KB (auto-called by PostBash hook)

Multi-Agent Coordination (2)

Tool	What it does
`zc_broadcast`	Post to A2A shared channel: ASSIGN/STATUS/MERGE/PROPOSED/DEPENDENCY/REJECT/REVISE/LAUNCH_ROLE/RETIRE_ROLE. File-ownership overlap guard at API layer.
`zc_explain`	Trace how a specific broadcast was routed / acknowledged

Work-Stealing Queue (6) — v0.17.0

Tool	What it does
`zc_enqueue_task`	Orchestrator enqueues into `task_queue_pg` keyed by (project, role)
`zc_claim_task`	Worker atomically claims oldest queued task (FOR UPDATE SKIP LOCKED)
`zc_heartbeat_task`	Refresh claim (workers must call every 30s)
`zc_complete_task`	Mark claimed task done
`zc_fail_task`	Mark failed + bump retries counter
`zc_queue_stats`	Count by state {queued, claimed, done, failed}

Graph Analysis (3) — v0.13.0

Tool	What it does
`zc_graph_query` / `zc_graph_path` / `zc_graph_neighbors`	Proxy to graphify subprocess
`zc_kb_cluster`	Louvain community detection over KB graph
`zc_kb_community_for`	Look up community of a specific KB source + community-mates

Cost Routing + Replay (2)

Tool	What it does
`zc_choose_model`	Returns Haiku/Sonnet/Opus recommendation for a complexity (informational — orchestrator decides)
`zc_replay` / `zc_ack`	Replay un-acknowledged broadcasts / mark one as seen

RBAC (2)

Tool	What it does
`zc_issue_token`	Short-lived HMAC-signed session token bound to (agent_id, role)
`zc_revoke_token`	Revoke all tokens for an agent

Observability (1)

Tool	What it does
`zc_logs`	Query structured logs (per-component, agent-scoped, trace_id correlation)

Installation

Docker Stack (recommended)

Prerequisites: Docker Desktop 4.x+, Node.js 22+.

git clone https://github.com/iampantherr/SecureContext
cd SecureContext
cp docker/.env.example docker/.env
# Edit docker/.env: set POSTGRES_PASSWORD + ZC_API_KEY (generate via crypto.randomBytes)

Start the stack:

# Windows — auto-detects NVIDIA / AMD / CPU
.\docker\start.ps1

# Linux / macOS
./docker/start.sh

Three containers come up: securecontext-postgres (PG + pgvector), securecontext-api (HTTP API), securecontext-ollama (embeddings). All set to restart: unless-stopped — they come back on every boot.

Verify:

curl http://localhost:3099/health
# {"status":"ok","version":"0.17.2","store":"postgres","ollamaAvailable":true,"searchMode":"hybrid (BM25 + vector)"}

Register with Claude:

node install.mjs --remote http://localhost:3099 <your-ZC_API_KEY>

Restart Claude Code after running.

Local SQLite (single-developer, no Docker)

Prerequisites: Node.js 22+ only. Clone repo, npm install, npm run build, then:

node install.mjs --local

Writes ~/.claude/settings.json's MCP zc-ctx entry pointing at dist/server.js. Data lives in ~/.claude/zc-ctx/sessions/{projectHash}.db. No vector search (falls back to BM25) unless Ollama is installed locally at http://127.0.0.1:11434.

Multi-Agent Harness

To run multi-agent sessions, also clone the companion dispatcher:

git clone https://github.com/iampantherr/A2A_dispatcher
cd A2A_dispatcher
# Launch 1 orchestrator (Opus) + 3 developers (Sonnet) sharing one queue:
powershell -File start-agents.ps1 -Project C:\path\to\your\project -Roles developer -WorkerCount 3

See A2A_dispatcher's README for LAUNCH_ROLE / RETIRE_ROLE / file ownership details.

Quick Start

In a new Claude Code session on your project:

You: please restore context and tell me what we were working on

Claude:  (calls zc_recall_context)
         [restores ~50 working memory facts, recent session events, shared
          broadcast channel, and health banner — all in ~1500 tokens]
         
         Last session we were refactoring auth: decided to use HMAC signing
         via KDF-derived per-agent subkeys. The prototype is in src/security/
         hmac_chain.ts and 28/28 unit tests pass. Next step was wiring the
         per-query SET LOCAL ROLE for RLS...

Tell Claude to remember things as they happen:

You: we settled on using pgvector over pinecone because we need local-first

Claude: (calls zc_remember with importance=5)

End the session:

You: wrap up please

Claude: (calls zc_summarize_session)  
        [persists structured summary; next session's zc_recall_context will surface it]

How It Compares

vs. `claude-mem` (21k+ stars)

Feature	claude-mem	SecureContext
Memory	AI-compressed summaries (lossy)	MemGPT importance-scored facts (structured, bounded)
Security	None documented	HMAC chain + per-agent HKDF + RLS + sandbox
Multi-agent	None	Work-stealing queue + dispatcher + broadcast channel
Telemetry	None	Per-call cost/latency/token rows
Learning loop	None	Outcome → failures.jsonl auto-feedback
Audit trail	None	HMAC-chained, tamper-detectable

vs. `context-mode` (the one SecureContext originally replaced)

Concern	context-mode	SecureContext
Env leaks to sandbox	❌ Full env inherited	✅ PATH-only env
SSRF protection	❌	✅ Multi-layer (protocol/DNS/redirect)
Prompt injection via KB	❌	✅ Pre-filter + trust labels on external content
Self-modifiable hooks	❌	✅ Hook paths + manifests verified
Multi-agent	❌	✅ Full harness

vs. Claude's Native Context Management

Concern	Native	SecureContext
Survives session restart	❌ starts fresh	✅ `zc_recall_context` restores
Handles 150k+ token context	Auto-compacts (lossy)	Bounded working memory + archival KB
Cost per session (overhead)	~$2–5	~$0.16 (with recall cache)
Cross-session continuity	❌	✅ summaries + facts persist
Pool parallelism	❌ single-session	✅ N workers via `-WorkerCount`

Cost Model

Claude Sonnet 4.6 pricing: $3/Mtok input, $15/Mtok output.
Claude Opus 4.7 pricing: $15/Mtok input, $75/Mtok output.

Typical SecureContext-harness session with Opus orchestrator + Sonnet developers:

Operation	Count	Cost
`zc_recall_context` on Opus (first call)	1	~$0.012
`zc_recall_context` on Opus (cache hit, 2nd+)	2	$0.00
`zc_choose_model` on Opus	1	$0.004
`zc_enqueue_task` × 4 on Opus	4	$0.024
`zc_broadcast` ASSIGN × 4 on Opus	4	$0.016
Developer work (Sonnet)	6	$0.014
`zc_summarize_session` on Opus	1	$0.003
Total per user-task-cycle		~$0.16

Same flow without the harness (native re-paste + cold retries): $2–5 per cycle. That's where the 87% token-savings claim comes from.

Testing & Verification

npm test              # 645 unit + integration tests
npm run lint          # ESLint @typescript-eslint/no-floating-promises
npm run check:env     # L1 env-pinning linter — catches un-pinned critical vars
npm run check:env:test  # self-test of the env linter (14 cases)
npm run lint:test     # self-test of the floating-promises rule (5 cases)

node security-tests/run-all.mjs  # 60+ red-team attack IDs (RT-S0-* through RT-S4-*)

Red-team categories:

Sandbox escape + credential isolation (RT-S0-*)
SSRF + fetcher (RT-S1-*)
SQLite / KB injection (RT-S1-12 symlink escape)
Hook + prompt-injection-via-KB
Chain tamper-detection (RT-S1-15/16)
Per-agent HKDF forgery (RT-S2-01)
Reference Monitor + token binding (RT-S2-02/03/04/05/06)
Cross-agent RLS (RT-S3-05)
Work-stealing queue correctness (RT-S4-01 — 50 workers × 100 tasks no double-claim)
File-ownership overlap guard (RT-S4-05/06/07)

Recent Changes

See CHANGELOG.md for the full history. Highlights from the last quarter:

v0.17.2 (2026-04-20) — L1 env-pinning linter + L3 no-floating-promises ESLint + L4 outcome → failures.jsonl auto-feedback. Closes 3 architectural-bug classes pre-Sprint-2.
v0.17.1 — Agent-idle fixes (claim-drain, Stop-hook queue-drain, dispatcher wake-nudge) + 60s recall cache + Tier 1+2 pricing correctness.
v0.17.0 — Postgres work-stealing queue (FOR UPDATE SKIP LOCKED) + complexity-based model router + file-ownership overlap guard + -WorkerCount N multi-worker pools.
v0.16.0 — Postgres backend for telemetry + outcomes + learnings; Tier 3 security (per-query SET LOCAL ROLE + Row-Level Security).
v0.15.0 — Structured ASSIGN schema (file_ownership_exclusive, complexity_estimate, acceptance_criteria, etc.) + MAC classification on outcomes.
v0.14.0 — Provenance tagging (EXTRACTED / INFERRED / AMBIGUOUS / UNKNOWN) + AST code extractor + Louvain community detection.
v0.13.0 — graphify integration (zc_graph_query / path / neighbors) + auto-indexed graph reports.
v0.12.0 — ChainedTable abstraction + per-agent HKDF subkey (Tier 1 security fix).
v0.11.0 — Telemetry foundation (tool_calls hash chain + outcomes pipeline + learnings mirror).

For v0.6–v0.10 history see CHANGELOG.md.

Contributing

Issues and PRs welcome. Before opening a PR:

npm run build
npm test                  # must be 645/645 (or updated)
npm run lint              # 0 errors
npm run check:env         # 0 unclassified
node security-tests/run-all.mjs  # all red-team IDs pass

Architectural decisions are recorded in C:\Users\Amit\AI_projects\.harness-planning\ARCHITECTURAL_LESSONS.md (local-only — not in the repo to keep internal strategy out of public history). Consult before proposing changes that touch the security foundation, telemetry pipeline, or work-stealing queue.

License

MIT — see LICENSE. Built for self-hostable, auditable agent infrastructure. No telemetry-back-to-vendor. No cloud dependencies beyond what you configure yourself.

Companion project: A2A_dispatcher — multi-agent orchestration layer that spawns / routes / retires worker pools against this harness.

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
docker		docker
docs		docs
hooks		hooks
scripts		scripts
security-tests		security-tests
src		src
.gitignore		.gitignore
AGENT_HARNESS.md		AGENT_HARNESS.md
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
README.md		README.md
SECURITY_REPORT.md		SECURITY_REPORT.md
UsersAmit.claudezc-ctxsessionsaafb4b029db36884.db		UsersAmit.claudezc-ctxsessionsaafb4b029db36884.db
eslint.config.mjs		eslint.config.mjs
install.mjs		install.mjs
llms.txt		llms.txt
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

SecureContext — Secure Multi-Agent Harness for Claude Code

What SecureContext Is Today

Headline Numbers

Key Capabilities

1. Persistent Memory That Survives Restarts

2. Security That Auditors Can Verify

3. Multi-Agent Coordination (Production-Grade)

4. Honest Cost Accounting

5. Closed Learning Loop (v0.17.2 L4)

6. Architectural Quality Gates

Architecture at a Glance

MCP Tools (25)

Memory + Retrieval (7)

Indexing + Knowledge (5)

Execution + Analysis (5)

Multi-Agent Coordination (2)

Work-Stealing Queue (6) — v0.17.0

Graph Analysis (3) — v0.13.0

Cost Routing + Replay (2)

RBAC (2)

Observability (1)

Installation

Docker Stack (recommended)

Local SQLite (single-developer, no Docker)

Multi-Agent Harness

Quick Start

How It Compares

vs. claude-mem (21k+ stars)

vs. context-mode (the one SecureContext originally replaced)

vs. Claude's Native Context Management

Cost Model

Testing & Verification

Recent Changes

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

vs. `claude-mem` (21k+ stars)

vs. `context-mode` (the one SecureContext originally replaced)

Packages