TracePawl

Failure diagnosis and visual postmortems for failed coding-agent runs.

TracePawl is a postmortem engine for autonomous coding agents. It takes a normalized JSON trace of a failed run, identifies the likely failure category, pinpoints where execution started to drift, surfaces the supporting evidence, and suggests a recovery action. It is the first focused product in the CodePawl stack.

What TracePawl is not

Not a generic observability dashboard
Not a LangSmith / Langfuse clone
Not a multi-agent runtime
Not a hosted or adapter-based trace collector
Not an LLM-as-judge service

Developer Preview flow

generic JSONL → strict validation → normalized TraceRun → facts → evidence graph → deterministic diagnosis → optional LLM review → eval case → trace diff → local policy gate

The analyzer is deterministic: no LLM, no network. Rules live in src/analyzer/rules/. The event protocol is documented in docs/protocol/event-protocol-v0.md, with schema and profile registry in schemas/ and profiles/.

Quickstart

After installing the package, run:

npm install -g @codepawl/tracepawl
tracepawl demo --open

When working from this repository, build first and use the local CLI:

bun install
bun run build
node dist/cli.js demo --open

The demo writes:

.tracepawl/runs/failing-run.trace.json
.tracepawl/reports/failing-run.md
.tracepawl/visual/failing-run.visual.json
.tracepawl/visual/failing-run.html

To run the Developer Preview path manually:

tracepawl import examples/generic-jsonl/failing-run.tracepawl.jsonl --out .tracepawl/runs/failing-run.trace.json
tracepawl analyze .tracepawl/runs/failing-run.trace.json --out .tracepawl/reports/failing-run.md
tracepawl visualize .tracepawl/runs/failing-run.trace.json --format json --out .tracepawl/visual/failing-run.visual.json
tracepawl visualize .tracepawl/runs/failing-run.trace.json --format html --out .tracepawl/visual/failing-run.html
tracepawl eval .tracepawl/runs/opencode-real.trace.json --out .tracepawl/evals/opencode-real.eval.json
tracepawl diff .tracepawl/runs/opencode-demo.trace.json .tracepawl/runs/opencode-real.trace.json --format markdown --out .tracepawl/diffs/opencode-demo-to-real.md
tracepawl gate .tracepawl/runs/opencode-real.trace.json --policy .tracepawl/policy.json --eval .tracepawl/evals/opencode-real.eval.json --out .tracepawl/gates/opencode-real.gate.json
tracepawl demo --open

From this repository without installing the package globally, use the built CLI:

bun run build
node dist/cli.js import examples/generic-jsonl/failing-run.tracepawl.jsonl --out .tracepawl/runs/failing-run.trace.json
node dist/cli.js analyze .tracepawl/runs/failing-run.trace.json --out .tracepawl/reports/failing-run.md
node dist/cli.js visualize .tracepawl/runs/failing-run.trace.json --format json --out .tracepawl/visual/failing-run.visual.json
node dist/cli.js visualize .tracepawl/runs/failing-run.trace.json --format html --out .tracepawl/visual/failing-run.html

Developer Preview scope

The public Developer Preview includes:

strict generic JSONL import for TracePawl event protocol v0
normalized TraceRun JSON output
deterministic Markdown postmortems
visual postmortem JSON
static local HTML viewer
local eval case promotion
deterministic trace diff
deterministic local policy gate
local command recorder and existing analyzer fixtures

Screenshots are not committed yet. Use .tracepawl/visual/failing-run.html from tracepawl demo --open as the screenshot/video capture target; future assets should live under docs/assets/.

OpenCode-compatible preview

TracePawl includes a local, sanitized OpenCode-style demo export. It is not an official OpenCode partnership or hosted integration; it is a deterministic import path for compatible JSON session/export data.

tracepawl import --source opencode examples/importers/opencode-demo-session.json --out .tracepawl/runs/opencode-demo.trace.json
tracepawl analyze .tracepawl/runs/opencode-demo.trace.json --out .tracepawl/reports/opencode-demo.md
tracepawl visualize .tracepawl/runs/opencode-demo.trace.json --format html --out .tracepawl/visual/opencode-demo.html

The demo shows a coding-agent session with a user goal, file reads, a formatter edit, shell test commands, repeated validation failure, and a handoff. The resulting postmortem classifies the run as test_failure_misdiagnosis, links evidence to command/test/file events, marks the failure onset, and suggests a local recovery action.

From a built checkout, generate all three artifacts at once:

node dist/cli.js demo --open

OpenCode-compatible real demo

For screenshots or video, TracePawl also includes a local OpenCode-compatible sandbox harness. It prepares a disposable JavaScript repo with a deterministic failing test, captures git/test/session artifacts, builds an OpenCode-style JSON session artifact, and renders TracePawl outputs locally.

This is separate from the fixture demo above. It is still local-first and is not an official OpenCode integration or partnership. The real scenario requires your own local OpenCode-compatible CLI setup:

bun run build
scripts/demo/opencode-real-scenario.sh prepare
# run the printed OpenCode command, then:
scripts/demo/opencode-real-scenario.sh resume

When the final captured npm test passes, TracePawl reports initial_test_failure_resolved: initial failing test, focused src/eventValidator.js edit, passing verification, and a review-and-commit next step. If the final test still fails, the report shows the captured failure diagnosis and recovery action instead.

For a deterministic no-OpenCode smoke:

scripts/demo/opencode-real-scenario.sh dry-run

See docs/demo/opencode-real-scenario.md for prerequisites, resume flow, artifact paths, and screenshot/video checklist.

Sample output

TracePawl Failure Report
========================

Failure: stale_context_edit

Summary:
  Edit to `src/paginate.ts` failed because the agent's snippet did not match current file content (2 attempts).

Root cause:
  The agent read `src/paginate.ts` earlier in the run, then attempted to edit it using that cached snippet. Something changed the file (or the agent's snippet was inaccurate to begin with), so the `old_string` anchor no longer appears verbatim. Retrying with the same stale snippet cannot succeed.

Failure onset: evt_006

Evidence:
  - [evt_006, evt_008] Failed `file_edit` event(s) whose `old_string` did not match current file content — a stale-context signal.
  - [evt_002] Prior `file_read` event(s) for the same path — the agent's edit context likely went stale between the read and the failed edit.

Contradicting evidence:
  None

Suggested recovery:
  Action: re_read_file
  Re-read the file from disk, locate the intended target by current line content, and retry the edit with a narrower, freshly-anchored patch.
  Parameters: {"path":"src/paginate.ts"}

Confidence: 0.85

Related events: evt_006, evt_008, evt_002

Trace ID: run_stale_context_edit_001

v0 failure categories

Category	What it catches
`stale_context_edit`	Agent edited a file using outdated context; `old_string` doesn't match current content.
`tool_misuse`	Tool called with invalid arguments, missing required fields, or violated preconditions.
`loop_or_stall`	Same tool call (or failing command) repeated ≥3 times with identical arguments.
`test_failure_misdiagnosis`	Failed test points at one file; agent edits unrelated files or silences the assertion.
`initial_test_failure_resolved`	Initial test failure is followed by a focused edit and passing verification rerun.
`unsafe_or_broad_edit`	Narrow user request produced edits spanning many files, directories, or lines.

See docs/FAILURE_CATEGORIES.md for the long form.

Try the other fixtures

node dist/cli.js analyze examples/tool-misuse.json
node dist/cli.js analyze examples/loop-or-stall.json
node dist/cli.js analyze examples/test-failure-misdiagnosis.json
node dist/cli.js analyze examples/unsafe-broad-edit.json

Realistic demo traces

examples/realistic/ contains small coding-agent failure demos with richer event timelines. They are useful for seeing both terminal reports and Markdown postmortems with timeline context around the failure onset:

node dist/cli.js analyze examples/realistic/stale-edit-after-file-change.json
node dist/cli.js analyze examples/realistic/stale-edit-after-file-change.json --format markdown --output /tmp/stale-edit-postmortem.md

Recording a real run

TracePawl can wrap a local command and write a trace that the analyzer can read:

tracepawl record --output tracepawl-runs/latest.json -- bun run test
tracepawl analyze tracepawl-runs/latest.json

Failed commands still produce valid traces. For example, with the installed package:

tracepawl record --output trace.json -- node -e "console.error('boom'); process.exit(1)"
tracepawl analyze trace.json

The recorder returns the child command's exit code after writing the trace, so the first command above exits non-zero. The analyzer can still inspect the trace and, when no deterministic rule matches, surfaces the failed command as an unknown report with command, cwd, exit, stderr/stdout snippets when present, and recovery guidance:

Failure: unknown

Summary:
  Recorded command `node -e console.error('boom'); process.exit(1)` failed with exit code 1.

Evidence:
  - [evt_001] Failed command
      Command: node -e console.error('boom'); process.exit(1)
      Cwd: /path/to/project
      Exit: exit code 1
      Stderr: "boom"

Suggested recovery:
  Action: request_human_input
  Inspect the failed command's stderr/stdout and any changed files.

When running from this repository before publishing or installing the package, use the built CLI directly:

bun run build
node dist/cli.js record --output tracepawl-runs/latest.json -- bun run test
node dist/cli.js analyze tracepawl-runs/latest.json

See docs/RECORDER.md for output-file behavior, latest.json, best-effort git_diff capture, exit codes, signal handling, and current recorder limits.

Current limitations

No dashboard or hosted service.
No external integrations (LangSmith, Langfuse, OpenTelemetry, Claude Code adapter, OpenCode-compatible adapter).
No runtime adapters yet (Claude Code, OpenCode-compatible, OpenTelemetry, LangSmith, Langfuse). Build traces via tracepawl record, the TraceWriter SDK (see docs/SDK.md), or externally produced JSON conforming to docs/TRACE_SCHEMA.md.
Importers are local converters for sanitized external logs; they are not hosted collectors.
Deterministic rules are the source of truth. Optional LLM review is opt-in, evidence-bounded, and separate from the rule diagnosis.
No replay engine. Replay-lite and full replay are post-v0.
No CloudPawl backend, authentication, billing, tenant UI, BYOK, advanced dashboards, or private deployment in this MVP. See docs/product/cloudpawl-roadmap.md for the hosted roadmap.

Optional LLM review

TracePawl does not call an LLM by default. tracepawl analyze and tracepawl demo --open remain deterministic and offline unless --llm-review is provided.

When enabled, LLM review is advisory only. The deterministic diagnosis remains the source of truth, and reports keep deterministic facts, rule diagnosis, and LLM review in separate sections.

Supported providers are mock, noop, openai-compatible, and ollama. The OpenAI-compatible provider works with compatible cloud or local runtimes such as llama.cpp server, LM Studio, LocalAI, vLLM, DeepInfra/OpenRouter-style endpoints when they expose compatible chat completions. Local OpenAI-compatible base URLs do not require an API key; non-local base URLs use TRACEPAWL_LLM_API_KEY.

Providers receive a bounded prompt: redacted run metadata, extracted facts, an evidence graph summary, the baseline deterministic diagnosis, and selected snippets keyed by event ID. TracePawl does not send the full raw trace by default, redacts .env-style secrets/API keys/bearer tokens from prompt fields, clips command output snippets, and bounds returned LLM event IDs to events present in the trace. If provider config is missing or unavailable, analysis still succeeds and the LLM review is reported as unavailable.

Local models can still return malformed JSON even when JSON output is requested. TracePawl asks providers for JSON-only responses, strips common markdown/code-fence wrappers, and safely extracts a bounded JSON object when possible. If parsing still fails, the LLM review is marked unavailable and the deterministic diagnosis remains the source of truth.

OpenAI-compatible local llama.cpp:

tracepawl analyze .tracepawl/runs/opencode-real.trace.json \
  --llm-review \
  --llm-provider openai-compatible \
  --llm-base-url http://localhost:8080/v1 \
  --llm-model Qwen2.5-Coder-7B-Instruct

Ollama:

tracepawl analyze .tracepawl/runs/opencode-real.trace.json \
  --llm-review \
  --llm-provider ollama \
  --llm-base-url http://localhost:11434 \
  --llm-model gemma4:e2b \
  --out .tracepawl/reports/opencode-real-llm.md

Environment configuration:

TRACEPAWL_LLM_PROVIDER=ollama
TRACEPAWL_LLM_MODEL=gemma4:e2b
TRACEPAWL_LLM_BASE_URL=http://localhost:11434

Local reliability loop

TracePawl can turn one useful run into a repeatable local reliability check:

trace -> facts -> evidence graph -> deterministic diagnosis -> optional LLM review -> eval case -> trace diff -> local policy gate

This loop is experimental and local-only in the Developer Preview. It does not ship hosted CloudPawl workflows or full CI integration; the gate command returns process exit codes that can be wired into local scripts or CI by users.

Promote a trace into an eval case:

tracepawl eval .tracepawl/runs/opencode-real.trace.json --out .tracepawl/evals/opencode-real.eval.json

Compare a future run against a baseline:

tracepawl diff .tracepawl/runs/opencode-demo.trace.json .tracepawl/runs/opencode-real.trace.json --format markdown --out .tracepawl/diffs/opencode-demo-to-real.md

Gate a trace or eval case with a local JSON policy:

tracepawl gate .tracepawl/runs/opencode-real.trace.json --policy .tracepawl/policy.json --eval .tracepawl/evals/opencode-real.eval.json --out .tracepawl/gates/opencode-real.gate.json

Example policy:

{
  "schemaVersion": "tracepawl.policy.v0",
  "requireValidation": true,
  "requirePassingFinalTest": true,
  "requireEvidence": true,
  "failOnUnresolvedSeriousFailure": true,
  "failOnRiskyEditWithoutTests": true,
  "minimumConfidence": 0.8
}

The gate is deterministic and does not run LLM review. A passing gate exits 0; a failing gate exits non-zero and reports explicit violations.

CloudPawl roadmap

TracePawl is the local-first open-core product. CloudPawl is the future hosted layer for run history, visual postmortem sharing, search, review workflows, retention, team analytics, and governance. Developer Preview users who want hosted sharing or team workflows should follow the CloudPawl roadmap in docs/product/cloudpawl-roadmap.md; a public waitlist link is not committed yet.

Recording a trace with the SDK

The TraceWriter SDK lets you record events from an agent runtime — no hand-authored JSON required. It owns event IDs and ISO timestamps, validates constructor inputs, and writes JSON that round-trips cleanly through the parser.

import { TraceWriter, analyzeTrace, formatTerminalReport } from "@codepawl/tracepawl";

const writer = new TraceWriter({ agent: "my-agent", userGoal: "Fix paginate()" });
writer.recordFileRead({ path: "src/paginate.ts" });
writer.recordFileEdit({
  path: "src/paginate.ts",
  oldString: "items.slice(start, end - 1)",
  newString: "items.slice(start, end)",
  applied: false,
  error: "old_string not found in file",
});
writer.finalize();

console.log(formatTerminalReport(analyzeTrace(writer.toJSON())));

See docs/SDK.md for the full API reference, ID/timestamp contracts, and common patterns. A runnable demo lives at examples/sdk/record-failed-run.ts:

bun run tsx examples/sdk/record-failed-run.ts

Library usage

import { parseTraceFile, analyzeTrace, formatTerminalReport } from "@codepawl/tracepawl";

const trace = await parseTraceFile("examples/stale-context-edit.json");
const report = analyzeTrace(trace);
console.log(formatTerminalReport(report));

analyzeTrace(trace) returns a FailureReport — see src/schema/failure.ts for the full shape.

Development

bun run typecheck
bun run test
bun run lint
bun run build

bun run check runs the same typecheck + lint + format-check gate that CI enforces.

Project status

v0 CLI analyzer and local recorder are functional. Five example traces are included — one per category — and all five resolve to their real failure category at confidence ≥ 0.80. Diagnosis is rule-based and deterministic.

Docs

docs/RELEASE.md — release prep, execution, and recovery checklist.
docs/RECORDER.md — local command recorder guide.
docs/SDK.md — TraceWriter producer-side reference.
docs/FAILURE_CATEGORIES.md — the five v0 categories in depth.
docs/TRACE_SCHEMA.md — TraceEvent union and FailureReport shape, for adapter authors.
docs/importers/generic-jsonl.md — v0 contract for a future generic JSONL importer.
docs/product/developer-preview.md — Developer Preview scope and demo requirements.
docs/protocol/event-protocol-v0.md — event protocol for hand-written and custom-agent JSONL.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
.agents		.agents
.claude/commands		.claude/commands
.github		.github
docs		docs
examples		examples
profiles		profiles
prompts		prompts
schemas		schemas
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.npmrc		.npmrc
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
biome.json		biome.json
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TracePawl

What TracePawl is not

Developer Preview flow

Quickstart

Developer Preview scope

OpenCode-compatible preview

OpenCode-compatible real demo

Sample output

v0 failure categories

Try the other fixtures

Realistic demo traces

Recording a real run

Current limitations

Optional LLM review

Local reliability loop

CloudPawl roadmap

Recording a trace with the SDK

Library usage

Development

Project status

Docs

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TracePawl

What TracePawl is not

Developer Preview flow

Quickstart

Developer Preview scope

OpenCode-compatible preview

OpenCode-compatible real demo

Sample output

v0 failure categories

Try the other fixtures

Realistic demo traces

Recording a real run

Current limitations

Optional LLM review

Local reliability loop

CloudPawl roadmap

Recording a trace with the SDK

Library usage

Development

Project status

Docs

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages