-
Notifications
You must be signed in to change notification settings - Fork 0
Exhaust Inbox
The Exhaust Inbox captures chat exhaust from hundreds of users interacting with AI and automatically refines it into three canonical buckets: TRUTH, REASONING, and MEMORY.
Every AI interaction produces exhaust — prompts, responses, tool calls, metrics, and errors. The Exhaust Inbox system ingests this raw stream, groups it into Decision Episodes, and runs extraction/refinement to produce structured, confidence-gated knowledge items.
Raw Events (JSONL)
|
v
┌─────────────────┐
│ Ingest & Group │ POST /api/exhaust/events
│ → Episodes │ POST /api/exhaust/episodes/assemble
└─────────────────┘
|
v
┌─────────────────┐
│ Refine │ POST /api/exhaust/episodes/{id}/refine
│ → 3 Buckets │ (rule-based or LLM-backed)
│ → Drift Signals │
│ → Coherence │
└─────────────────┘
|
v
┌─────────────────┐
│ Review & Commit │ Human-in-the-loop or auto-commit
│ → Memory Graph │ POST /api/exhaust/episodes/{id}/commit
└─────────────────┘
| Bucket | Contains | Example |
|---|---|---|
| TRUTH | Atomic claims, evidence, metrics | "service-alpha is running v2.4.1 with 3 replicas" |
| REASONING | Decisions, assumptions, alternatives, rationale | "Recommend monitoring 30 more minutes before rollback" |
| MEMORY | Entities, relations, artifacts, context pointers | "service-alpha → deployed_by → ml-ops team" |
Each item has a confidence score (0–1) that determines its gating tier:
| Score | Tier | Action |
|---|---|---|
| ≥ 0.85 | auto_commit |
Automatically committed (green) |
| 0.65 – 0.84 | review_required |
Human review needed (amber) |
| < 0.65 | hold |
Held for investigation (red) |
By default the refiner uses rule-based extraction (keyword and pattern matching). Enable LLM-backed extraction using claude-haiku-4-5-20251001 for higher-quality buckets:
pip install -e ".[exhaust-llm]"
export ANTHROPIC_API_KEY="sk-ant-..."
export EXHAUST_USE_LLM="1"When enabled, the LLMExtractor in engine/exhaust_llm_extractor.py builds a transcript of episode events (truncated to 6 000 chars), calls the Anthropic Messages API with a JSON-only system prompt, and parses the structured response into TRUTH / REASONING / MEMORY items. All confidence values are clamped to [0.0, 1.0].
Fallback behaviour: if the API call fails or returns unparseable output, the rule-based extractor runs transparently — no episode is lost and no error is surfaced.
Grade expectation: LLM extraction typically produces B or A grades for well-structured episodes; rule-based typically yields C or D.
Set EXHAUST_USE_LLM=0 at any time to revert to rule-based without redeploying.
See also: engine/exhaust_llm_extractor.py and the LLM extraction cookbook.
The ExhaustCallbackHandler extends the DeepSigma callback handler to emit EpisodeEvent payloads during chain execution. Events are buffered and flushed to the ingestion endpoint. Each LLM call also emits a metric event carrying latency_ms and the model name.
from adapters.langchain_exhaust import ExhaustCallbackHandler
handler = ExhaustCallbackHandler(
endpoint="http://localhost:8000/api/exhaust/events",
project="my-project",
team="ml-team",
)
chain.invoke(input, config={"callbacks": [handler]})See: adapters/langchain_exhaust.py
The Anthropic batch adapter reads JSONL exports of Anthropic Messages API
responses, normalises them to EpisodeEvent format (prompt / completion /
tool / metric), groups by session_id + time_window (30 min default), and
POSTs to the ingestion endpoint.
python -m adapters.anthropic_exhaust \
--file /path/to/anthropic_logs.jsonl \
--project my-project \
--dry-run # preview without POSTingSee: adapters/anthropic_exhaust.py
The batch adapter reads JSONL log exports from OpenAI or Azure OpenAI, normalises them to EpisodeEvent format, groups by user_hash + conversation_id + time_window (30 min default), and POSTs to the ingestion endpoint.
python -m adapters.azure_openai_exhaust \
--input /path/to/logs.jsonl \
--project my-projectSee: adapters/azure_openai_exhaust.py
- Ingest — Raw events arrive via POST or file import
-
Assemble — Events are grouped into Decision Episodes by
session_id + user_hash + time_window - Refine — The refiner extracts truth/reasoning/memory items, detects drift, computes coherence
- Review — Human reviews items in the three-lane UI (or auto-commit for high-confidence items)
- Commit — Accepted items flow into the Memory Graph; drift signals feed the Drift→Patch loop
The refiner performs MVP drift detection on each episode:
| Signal Type | Description |
|---|---|
contradiction |
Claim contradicts existing canon |
missing_policy |
Required policy flag not present |
low_coverage |
Insufficient claim coverage for the episode |
stale_reference |
Memory item references an episode ID no longer in the memory graph |
Each signal has a severity (Green / Yellow / Red) and a fingerprint (stable hash for dedup) with an optional recommended_patch.
Episodes receive a coherence score (0–100) computed from weighted dimensions:
| Dimension | Weight | Description |
|---|---|---|
claim_coverage |
25% | How well claims cover the episode content |
evidence_quality |
25% | Strength and specificity of evidence |
reasoning_completeness |
20% | Whether decisions have rationale and alternatives |
memory_linkage |
15% | Entity/relation extraction quality |
policy_adherence |
15% | Compliance with active policy packs |
Grade mapping: A ≥ 85, B ≥ 75, C ≥ 65, D < 65
The Exhaust Inbox renders as a three-lane layout at #/inbox:
| Lane | Component | Content |
|---|---|---|
| Left | EpisodeStream |
Scrollable list of episode cards with filters |
| Center | EpisodeDetail |
Timeline view of events (prompt / tool / response chips) |
| Right | BucketPanel |
Tabbed TRUTH / REASONING / MEMORY with accept/reject/edit |
- Project, Team, Source (text)
- Coherence score range (0–100)
- Drift-only toggle
- Low-confidence-only toggle
- Accept / Reject / Edit individual items
- Accept All — bulk accept pending items
- Escalate — flag for senior review
- Commit — persist refined episode to storage
| Method | Path | Description |
|---|---|---|
GET |
/api/exhaust/health |
Health check — returns event/episode/refined counts |
POST |
/api/exhaust/events |
Ingest a raw EpisodeEvent
|
POST |
/api/exhaust/episodes/assemble |
Group buffered events into episodes |
GET |
/api/exhaust/episodes |
List all assembled episodes |
GET |
/api/exhaust/episodes/{id} |
Episode detail |
POST |
/api/exhaust/episodes/{id}/refine |
Extract buckets + drift + coherence score |
POST |
/api/exhaust/episodes/{id}/commit |
Commit refined episode to Memory Graph |
PATCH |
/api/exhaust/episodes/{id}/items/{item_id} |
Accept / reject / edit a single bucket item |
GET |
/api/exhaust/mg |
Read the Memory Graph |
GET |
/api/exhaust/drift |
List all drift signals |
File-based under /app/data (single-writer, uvicorn --workers 1):
| Path | Format | Description |
|---|---|---|
events.jsonl |
Append-only JSONL | Raw ingested events |
episodes/{id}.json |
JSON per episode | Assembled episodes |
refined/{id}.json |
JSON per episode | Refined episodes with buckets |
mg/memory_graph.jsonl |
Append-only JSONL | Committed memory graph entries |
drift/drift.jsonl |
Append-only JSONL | Drift signals |
python tools/exhaust_cli.py import --file specs/sample_episode_events.jsonl
python tools/exhaust_cli.py assemble
python tools/exhaust_cli.py list
python tools/exhaust_cli.py refine --episode <id>
python tools/exhaust_cli.py commit --episode <id>
python tools/exhaust_cli.py healthFour Mermaid diagrams cover the full Exhaust Inbox architecture:
| Diagram | Contents |
|---|---|
archive/mermaid/29-exhaust-inbox-pipeline.md |
Full pipeline (graph TB), bucket extraction detail (flowchart), episode state machine (stateDiagram) |
archive/mermaid/29-exhaust-connector-map.md |
Adapter ecosystem (graph LR), Source enum mapping, LangChain metric enrichment |
archive/mermaid/29-exhaust-api-surface.md |
All 10 endpoints (graph TD), refine→commit sequence diagram, status codes |
archive/mermaid/29-llm-extraction-flow.md |
LLMExtractor internals (flowchart), API call sequence, confidence clamping, system prompt |
- Architecture — Where OVERWATCH sits in the stack
- Drift → Patch — Drift detection and remediation lifecycle
- Sealing & Episodes — Episode sealing and immutability
- Coherence Ops Mapping — DLR/RS/DS/MG pipeline
- LangChain — LangChain integration details
- Unified Atomic Claims — Claim schema and primitives
- Canon — Blessed claim memory
Σ OVERWATCH — Coherence Ops Platform • Current release: v2.1.0 • DeepSigma
- Start
- Core
- Schemas
- FEEDS + Exhaust
- Integrations
- Reference Layer
- Ops
- Excel-First
- EDGE + ABP
- Domain Modes
- Governance
- Meta