Failure diagnosis and visual postmortems for failed coding-agent runs.
TracePawl is a postmortem engine for autonomous coding agents. It takes a normalized JSON trace of a failed run, identifies the likely failure category, pinpoints where execution started to drift, surfaces the supporting evidence, and suggests a recovery action. It is the first focused product in the CodePawl stack.
- Not a generic observability dashboard
- Not a LangSmith / Langfuse clone
- Not a multi-agent runtime
- Not a hosted or adapter-based trace collector
- Not an LLM-as-judge service
generic JSONL → strict validation → normalized TraceRun → facts → evidence graph → deterministic diagnosis → optional LLM review → eval case → trace diff → local policy gate
The analyzer is deterministic: no LLM, no network. Rules live in src/analyzer/rules/. The event protocol is documented in docs/protocol/event-protocol-v0.md, with schema and profile registry in schemas/ and profiles/.
After installing the package, run:
npm install -g @codepawl/tracepawl
tracepawl demo --openWhen working from this repository, build first and use the local CLI:
bun install
bun run build
node dist/cli.js demo --openThe demo writes:
.tracepawl/runs/failing-run.trace.json
.tracepawl/reports/failing-run.md
.tracepawl/visual/failing-run.visual.json
.tracepawl/visual/failing-run.html
To run the Developer Preview path manually:
tracepawl import examples/generic-jsonl/failing-run.tracepawl.jsonl --out .tracepawl/runs/failing-run.trace.json
tracepawl analyze .tracepawl/runs/failing-run.trace.json --out .tracepawl/reports/failing-run.md
tracepawl visualize .tracepawl/runs/failing-run.trace.json --format json --out .tracepawl/visual/failing-run.visual.json
tracepawl visualize .tracepawl/runs/failing-run.trace.json --format html --out .tracepawl/visual/failing-run.html
tracepawl eval .tracepawl/runs/opencode-real.trace.json --out .tracepawl/evals/opencode-real.eval.json
tracepawl diff .tracepawl/runs/opencode-demo.trace.json .tracepawl/runs/opencode-real.trace.json --format markdown --out .tracepawl/diffs/opencode-demo-to-real.md
tracepawl gate .tracepawl/runs/opencode-real.trace.json --policy .tracepawl/policy.json --eval .tracepawl/evals/opencode-real.eval.json --out .tracepawl/gates/opencode-real.gate.json
tracepawl demo --openFrom this repository without installing the package globally, use the built CLI:
bun run build
node dist/cli.js import examples/generic-jsonl/failing-run.tracepawl.jsonl --out .tracepawl/runs/failing-run.trace.json
node dist/cli.js analyze .tracepawl/runs/failing-run.trace.json --out .tracepawl/reports/failing-run.md
node dist/cli.js visualize .tracepawl/runs/failing-run.trace.json --format json --out .tracepawl/visual/failing-run.visual.json
node dist/cli.js visualize .tracepawl/runs/failing-run.trace.json --format html --out .tracepawl/visual/failing-run.htmlThe public Developer Preview includes:
- strict generic JSONL import for TracePawl event protocol v0
- normalized
TraceRunJSON output - deterministic Markdown postmortems
- visual postmortem JSON
- static local HTML viewer
- local eval case promotion
- deterministic trace diff
- deterministic local policy gate
- local command recorder and existing analyzer fixtures
Screenshots are not committed yet. Use .tracepawl/visual/failing-run.html from tracepawl demo --open as the screenshot/video capture target; future assets should live under docs/assets/.
TracePawl includes a local, sanitized OpenCode-style demo export. It is not an official OpenCode partnership or hosted integration; it is a deterministic import path for compatible JSON session/export data.
tracepawl import --source opencode examples/importers/opencode-demo-session.json --out .tracepawl/runs/opencode-demo.trace.json
tracepawl analyze .tracepawl/runs/opencode-demo.trace.json --out .tracepawl/reports/opencode-demo.md
tracepawl visualize .tracepawl/runs/opencode-demo.trace.json --format html --out .tracepawl/visual/opencode-demo.htmlThe demo shows a coding-agent session with a user goal, file reads, a formatter edit, shell test commands, repeated validation failure, and a handoff. The resulting postmortem classifies the run as test_failure_misdiagnosis, links evidence to command/test/file events, marks the failure onset, and suggests a local recovery action.
From a built checkout, generate all three artifacts at once:
node dist/cli.js demo --openFor screenshots or video, TracePawl also includes a local OpenCode-compatible sandbox harness. It prepares a disposable JavaScript repo with a deterministic failing test, captures git/test/session artifacts, builds an OpenCode-style JSON session artifact, and renders TracePawl outputs locally.
This is separate from the fixture demo above. It is still local-first and is not an official OpenCode integration or partnership. The real scenario requires your own local OpenCode-compatible CLI setup:
bun run build
scripts/demo/opencode-real-scenario.sh prepare
# run the printed OpenCode command, then:
scripts/demo/opencode-real-scenario.sh resumeWhen the final captured npm test passes, TracePawl reports initial_test_failure_resolved: initial failing test, focused src/eventValidator.js edit, passing verification, and a review-and-commit next step. If the final test still fails, the report shows the captured failure diagnosis and recovery action instead.
For a deterministic no-OpenCode smoke:
scripts/demo/opencode-real-scenario.sh dry-runSee docs/demo/opencode-real-scenario.md for prerequisites, resume flow, artifact paths, and screenshot/video checklist.
TracePawl Failure Report
========================
Failure: stale_context_edit
Summary:
Edit to `src/paginate.ts` failed because the agent's snippet did not match current file content (2 attempts).
Root cause:
The agent read `src/paginate.ts` earlier in the run, then attempted to edit it using that cached snippet. Something changed the file (or the agent's snippet was inaccurate to begin with), so the `old_string` anchor no longer appears verbatim. Retrying with the same stale snippet cannot succeed.
Failure onset: evt_006
Evidence:
- [evt_006, evt_008] Failed `file_edit` event(s) whose `old_string` did not match current file content — a stale-context signal.
- [evt_002] Prior `file_read` event(s) for the same path — the agent's edit context likely went stale between the read and the failed edit.
Contradicting evidence:
None
Suggested recovery:
Action: re_read_file
Re-read the file from disk, locate the intended target by current line content, and retry the edit with a narrower, freshly-anchored patch.
Parameters: {"path":"src/paginate.ts"}
Confidence: 0.85
Related events: evt_006, evt_008, evt_002
Trace ID: run_stale_context_edit_001
| Category | What it catches |
|---|---|
stale_context_edit |
Agent edited a file using outdated context; old_string doesn't match current content. |
tool_misuse |
Tool called with invalid arguments, missing required fields, or violated preconditions. |
loop_or_stall |
Same tool call (or failing command) repeated ≥3 times with identical arguments. |
test_failure_misdiagnosis |
Failed test points at one file; agent edits unrelated files or silences the assertion. |
initial_test_failure_resolved |
Initial test failure is followed by a focused edit and passing verification rerun. |
unsafe_or_broad_edit |
Narrow user request produced edits spanning many files, directories, or lines. |
See docs/FAILURE_CATEGORIES.md for the long form.
node dist/cli.js analyze examples/tool-misuse.json
node dist/cli.js analyze examples/loop-or-stall.json
node dist/cli.js analyze examples/test-failure-misdiagnosis.json
node dist/cli.js analyze examples/unsafe-broad-edit.jsonexamples/realistic/ contains small coding-agent failure demos with richer event timelines. They are useful for seeing both terminal reports and Markdown postmortems with timeline context around the failure onset:
node dist/cli.js analyze examples/realistic/stale-edit-after-file-change.json
node dist/cli.js analyze examples/realistic/stale-edit-after-file-change.json --format markdown --output /tmp/stale-edit-postmortem.mdTracePawl can wrap a local command and write a trace that the analyzer can read:
tracepawl record --output tracepawl-runs/latest.json -- bun run test
tracepawl analyze tracepawl-runs/latest.jsonFailed commands still produce valid traces. For example, with the installed package:
tracepawl record --output trace.json -- node -e "console.error('boom'); process.exit(1)"
tracepawl analyze trace.jsonThe recorder returns the child command's exit code after writing the trace, so the first command above exits non-zero. The analyzer can still inspect the trace and, when no deterministic rule matches, surfaces the failed command as an unknown report with command, cwd, exit, stderr/stdout snippets when present, and recovery guidance:
Failure: unknown
Summary:
Recorded command `node -e console.error('boom'); process.exit(1)` failed with exit code 1.
Evidence:
- [evt_001] Failed command
Command: node -e console.error('boom'); process.exit(1)
Cwd: /path/to/project
Exit: exit code 1
Stderr: "boom"
Suggested recovery:
Action: request_human_input
Inspect the failed command's stderr/stdout and any changed files.
When running from this repository before publishing or installing the package, use the built CLI directly:
bun run build
node dist/cli.js record --output tracepawl-runs/latest.json -- bun run test
node dist/cli.js analyze tracepawl-runs/latest.jsonSee docs/RECORDER.md for output-file behavior, latest.json, best-effort git_diff capture, exit codes, signal handling, and current recorder limits.
- No dashboard or hosted service.
- No external integrations (LangSmith, Langfuse, OpenTelemetry, Claude Code adapter, OpenCode-compatible adapter).
- No runtime adapters yet (Claude Code, OpenCode-compatible, OpenTelemetry, LangSmith, Langfuse). Build traces via
tracepawl record, theTraceWriterSDK (seedocs/SDK.md), or externally produced JSON conforming todocs/TRACE_SCHEMA.md. - Importers are local converters for sanitized external logs; they are not hosted collectors.
- Deterministic rules are the source of truth. Optional LLM review is opt-in, evidence-bounded, and separate from the rule diagnosis.
- No replay engine. Replay-lite and full replay are post-v0.
- No CloudPawl backend, authentication, billing, tenant UI, BYOK, advanced dashboards, or private deployment in this MVP. See
docs/product/cloudpawl-roadmap.mdfor the hosted roadmap.
TracePawl does not call an LLM by default. tracepawl analyze and tracepawl demo --open remain deterministic and offline unless --llm-review is provided.
When enabled, LLM review is advisory only. The deterministic diagnosis remains the source of truth, and reports keep deterministic facts, rule diagnosis, and LLM review in separate sections.
Supported providers are mock, noop, openai-compatible, and ollama. The OpenAI-compatible provider works with compatible cloud or local runtimes such as llama.cpp server, LM Studio, LocalAI, vLLM, DeepInfra/OpenRouter-style endpoints when they expose compatible chat completions. Local OpenAI-compatible base URLs do not require an API key; non-local base URLs use TRACEPAWL_LLM_API_KEY.
Providers receive a bounded prompt: redacted run metadata, extracted facts, an evidence graph summary, the baseline deterministic diagnosis, and selected snippets keyed by event ID. TracePawl does not send the full raw trace by default, redacts .env-style secrets/API keys/bearer tokens from prompt fields, clips command output snippets, and bounds returned LLM event IDs to events present in the trace. If provider config is missing or unavailable, analysis still succeeds and the LLM review is reported as unavailable.
Local models can still return malformed JSON even when JSON output is requested. TracePawl asks providers for JSON-only responses, strips common markdown/code-fence wrappers, and safely extracts a bounded JSON object when possible. If parsing still fails, the LLM review is marked unavailable and the deterministic diagnosis remains the source of truth.
OpenAI-compatible local llama.cpp:
tracepawl analyze .tracepawl/runs/opencode-real.trace.json \
--llm-review \
--llm-provider openai-compatible \
--llm-base-url http://localhost:8080/v1 \
--llm-model Qwen2.5-Coder-7B-InstructOllama:
tracepawl analyze .tracepawl/runs/opencode-real.trace.json \
--llm-review \
--llm-provider ollama \
--llm-base-url http://localhost:11434 \
--llm-model gemma4:e2b \
--out .tracepawl/reports/opencode-real-llm.mdEnvironment configuration:
TRACEPAWL_LLM_PROVIDER=ollama
TRACEPAWL_LLM_MODEL=gemma4:e2b
TRACEPAWL_LLM_BASE_URL=http://localhost:11434TracePawl can turn one useful run into a repeatable local reliability check:
trace -> facts -> evidence graph -> deterministic diagnosis -> optional LLM review -> eval case -> trace diff -> local policy gate
This loop is experimental and local-only in the Developer Preview. It does not ship hosted CloudPawl workflows or full CI integration; the gate command returns process exit codes that can be wired into local scripts or CI by users.
Promote a trace into an eval case:
tracepawl eval .tracepawl/runs/opencode-real.trace.json --out .tracepawl/evals/opencode-real.eval.jsonCompare a future run against a baseline:
tracepawl diff .tracepawl/runs/opencode-demo.trace.json .tracepawl/runs/opencode-real.trace.json --format markdown --out .tracepawl/diffs/opencode-demo-to-real.mdGate a trace or eval case with a local JSON policy:
tracepawl gate .tracepawl/runs/opencode-real.trace.json --policy .tracepawl/policy.json --eval .tracepawl/evals/opencode-real.eval.json --out .tracepawl/gates/opencode-real.gate.jsonExample policy:
{
"schemaVersion": "tracepawl.policy.v0",
"requireValidation": true,
"requirePassingFinalTest": true,
"requireEvidence": true,
"failOnUnresolvedSeriousFailure": true,
"failOnRiskyEditWithoutTests": true,
"minimumConfidence": 0.8
}The gate is deterministic and does not run LLM review. A passing gate exits 0; a failing gate exits non-zero and reports explicit violations.
TracePawl is the local-first open-core product. CloudPawl is the future hosted layer for run history, visual postmortem sharing, search, review workflows, retention, team analytics, and governance. Developer Preview users who want hosted sharing or team workflows should follow the CloudPawl roadmap in docs/product/cloudpawl-roadmap.md; a public waitlist link is not committed yet.
The TraceWriter SDK lets you record events from an agent runtime — no hand-authored JSON required. It owns event IDs and ISO timestamps, validates constructor inputs, and writes JSON that round-trips cleanly through the parser.
import { TraceWriter, analyzeTrace, formatTerminalReport } from "@codepawl/tracepawl";
const writer = new TraceWriter({ agent: "my-agent", userGoal: "Fix paginate()" });
writer.recordFileRead({ path: "src/paginate.ts" });
writer.recordFileEdit({
path: "src/paginate.ts",
oldString: "items.slice(start, end - 1)",
newString: "items.slice(start, end)",
applied: false,
error: "old_string not found in file",
});
writer.finalize();
console.log(formatTerminalReport(analyzeTrace(writer.toJSON())));See docs/SDK.md for the full API reference, ID/timestamp contracts, and common patterns. A runnable demo lives at examples/sdk/record-failed-run.ts:
bun run tsx examples/sdk/record-failed-run.tsimport { parseTraceFile, analyzeTrace, formatTerminalReport } from "@codepawl/tracepawl";
const trace = await parseTraceFile("examples/stale-context-edit.json");
const report = analyzeTrace(trace);
console.log(formatTerminalReport(report));analyzeTrace(trace) returns a FailureReport — see src/schema/failure.ts for the full shape.
bun run typecheck
bun run test
bun run lint
bun run buildbun run check runs the same typecheck + lint + format-check gate that CI enforces.
v0 CLI analyzer and local recorder are functional. Five example traces are included — one per category — and all five resolve to their real failure category at confidence ≥ 0.80. Diagnosis is rule-based and deterministic.
docs/RELEASE.md— release prep, execution, and recovery checklist.docs/RECORDER.md— local command recorder guide.docs/SDK.md—TraceWriterproducer-side reference.docs/FAILURE_CATEGORIES.md— the five v0 categories in depth.docs/TRACE_SCHEMA.md—TraceEventunion andFailureReportshape, for adapter authors.docs/importers/generic-jsonl.md— v0 contract for a future generic JSONL importer.docs/product/developer-preview.md— Developer Preview scope and demo requirements.docs/protocol/event-protocol-v0.md— event protocol for hand-written and custom-agent JSONL.
MIT © An Nguyen