feat: Wire openclaw:prompt-error events into alerts stream by vivekchand · Pull Request #797 · vivekchand/clawmetry

vivekchand · 2026-04-23T21:26:14Z

Closes #601

What

Adds visibility for provider-level errors (rate limits, auth issues, context overflow, model-not-found) that were previously ignored in the ClawMetry dashboard.

How

New /api/prompt-errors endpoint in routes/overview.py that scans session JSONL files for openclaw:prompt-error custom events
Red alert banner on Overview tab showing recent prompt errors with: provider, model, error type, timestamp
Auto-refreshes every 30s when on Overview tab
Deduplicates errors using timestamp tracking

Changes

routes/overview.py: New api_prompt_errors() endpoint
dashboard.py: Added loadPromptErrors() JS function + polling timer
clawmetry/static/js/app.js: Added loadPromptErrors() function + timer
clawmetry/templates/tabs/overview.html: Added prompt error banner UI

Acceptance

/api/prompt-errors endpoint filters for openclaw:prompt-error events
Banner/alert section on Overview tab showing recent prompt errors
Displays: provider, model, error type, timestamp

vivekchand

Test plan & review notes

What changed

Adds a /api/prompt-errors endpoint that scans session JSONL files for openclaw:prompt-error custom events, plus a red alert banner on the Overview tab that polls the endpoint every 30 s and renders provider/model/error-type/timestamp rows.

Smoke commands

make test or make test-api
python3 dashboard.py --port 8900
Trigger a prompt-error event (or mock one) and verify an alert fires: inject a JSONL line like {"type":"custom","customType":"openclaw:prompt-error","timestamp":"2026-05-05T12:00:00.000Z","data":{"provider":"anthropic","model":"claude-3-opus","error":"rate_limit","timestamp":1746446400000}} into ~/.openclaw/agents/main/sessions/test.jsonl, then hit /api/prompt-errors and reload the Overview tab to confirm the banner appears.

Likely failure modes from the diff

Alert storm / no server-side deduplication: The since filter relies on the client tracking _promptErrorLastTs, but that state resets on every page load. If a session file accumulates many errors they will all re-appear on each fresh load. Consider storing dismissed/seen IDs server-side (or in localStorage) and/or enforcing a default since window (e.g. last 1 h) in the endpoint.
Timestamp comparison mismatch: The since filter compares pdata.get("timestamp", 0) (an integer in ms from the inner data dict) against the outer obj.get("timestamp", "") ISO string that is stored in the returned payload. If callers pass back the outer ISO timestamp as since, the integer comparison ts <= since_ms will always be 0 <= since_ms and never filter anything. The two timestamp fields need to be kept consistent.
Full-dir scan on every poll: Every 30 s the endpoint calls os.listdir + opens every .jsonl file up to 512 KB each. On a workspace with hundreds of long-running sessions this adds up. A lightweight index (mtime cache, or scanning only files modified in the last N minutes) would help.
Gateway WS drop: The feature polls the filesystem rather than listening on the gateway WebSocket, so it will still work if the WS drops — but it also means errors that are only emitted live (not written to JSONL) would be missed. Worth confirming that OpenClaw always persists openclaw:prompt-error events to the session file.
XSS via unsanitised fields: provider, model, and error values from JSONL are injected directly into innerHTML strings in loadPromptErrors(). A crafted session file could execute arbitrary JS. Use textContent assignment or escape the values before rendering.
app.js / dashboard.py duplication: loadPromptErrors() and _promptErrorLastTs are defined in both clawmetry/static/js/app.js and the embedded JS in dashboard.py. They will be declared twice in any page that loads both, causing the _promptErrorLastTs state to be split. Confirm only one definition ends up in the rendered page, or de-duplicate.

Issue link

Closes #601 — confirmed in PR body.

Generated by Claude Code

vivekchand

Test plan & review notes

What changed

Adds a red alert banner to the Overview tab that surfaces openclaw:prompt-error custom events (rate limits, auth failures, context overflow, model-not-found) previously invisible in the dashboard; new GET /api/prompt-errors endpoint in routes/overview.py scans session JSONL files, with a 30s polling loop and incremental since deduplication.

Smoke commands

make test-api
python3 dashboard.py --port 8900

What to look at visually

http://localhost:8900 → Overview tab → top of page — confirm the red prompt-error banner is hidden when there are no errors and appears automatically (no reload needed) within 30s when a openclaw:prompt-error custom event is present in any session JSONL; verify each row shows time, provider badge, model name, and error-type pill
http://localhost:8900/api/prompt-errors?limit=10 (raw JSON) — confirm the errors array contains objects with timestamp, provider, model, api, error, runId, and sessionId keys; test with ?since=<unix_ms> to verify incremental filtering works

Likely failure modes

The since filter compares pdata.get("timestamp", 0) (a value from the data sub-object, which may be a ms integer or absent) against since_ms, but the top-level obj["timestamp"] used for sorting is an ISO string — if the data.timestamp field is missing or typed differently than expected, since filtering silently passes all events or drops all of them.
The JS _promptErrorLastTs tracker is updated from new Date(err.timestamp).getTime() (ms), but the backend since param is compared against pdata.get("timestamp", 0) which could be a different unit or format — mismatched units would cause the incremental poll to re-show the same errors on every 30s tick without ever clearing them.
The loadPromptErrors function is duplicated verbatim between clawmetry/static/js/app.js and the embedded JS in dashboard.py; the dashboard.py copy includes a var api = err.api || '' line that the app.js copy omits — this divergence means the api field is silently dropped in the static-file serving path.
No unit or integration tests are added for the new endpoint; a tests/test_prompt_errors.py would be the natural companion (compare tests/test_heartbeat.py pattern in PR #812).
WebSocket event flow to verify: OpenClaw gateway emits openclaw:prompt-error → OpenClaw writes the event as a {"type":"custom","customType":"openclaw:prompt-error","data":{...}} line into the session JSONL → /api/prompt-errors picks it up on next 512KB tail-read → loadPromptErrors() poll surfaces it in the banner within 30s.

Issue link

Closes #601 (confirmed in PR body and branch name fix/gh-clawmetry-601-prompt-error-alerts)

Generated by Claude Code

vivekchand

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

New GET /api/prompt-errors endpoint in routes/overview.py scanning session JSONL files for openclaw:prompt-error custom events
Red alert banner on the Overview tab, auto-refreshing every 30s; deduplicates errors by timestamp

Smoke commands

python3 -c 'import ast; ast.parse(open("routes/overview.py").read())' — syntax clean
curl -sS http://localhost:8900/api/prompt-errors — expect {"errors": [...]} (empty array is fine if no errors in logs)
Inject a synthetic openclaw:prompt-error event into a test JSONL and re-check — should surface in the response
curl -sS http://localhost:8900/api/overview — confirm existing endpoint is unaffected

What to look at visually

http://localhost:8900 → Overview tab — red alert banner should appear if any prompt errors exist in the 30-day scan window

Likely failure modes from the diff

Synchronous JSONL scan: large session directories could make this slow on each 30s poll — check for a per-request limit on sessions scanned or a capped scan window
Timestamp deduplication: if two errors share the same timestamp (rare but possible), one would be silently dropped

Issue link

Closes #601

Generated by Claude Code

vivekchand

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

Adds /api/prompt-errors endpoint in routes/overview.py that scans session JSONL files for openclaw:prompt-error custom events, plus a red alert banner on the Overview tab that polls every 30s and deduplicates by timestamp.

Smoke commands

make test or make test-api
python3 dashboard.py --port 8900
curl -sS http://localhost:8900/api/prompt-errors — verify response shape is {"errors": [...]}
curl -sS "http://localhost:8900/api/prompt-errors?limit=5&since=0" — confirm since and limit params are respected
Drop a synthetic JSONL line ({"type":"custom","customType":"openclaw:prompt-error","timestamp":"2026-05-07T10:00:00Z","data":{"provider":"anthropic","model":"claude-3-opus","error":"rate_limit","runId":"r1","sessionId":"s1","api":"anthropic"}}) into a .jsonl file under ~/.openclaw/agents/main/sessions/ and confirm the banner appears on the Overview tab.
With a live gateway: trigger a real rate-limit or auth error and confirm it surfaces within the next 30s poll.

Likely failure modes from the diff

Timestamp field mismatch: the since filter compares against pdata.get("timestamp", 0) (a value pulled from inside data), but the JS deduplication tracks new Date(err.timestamp).getTime() using the outer obj["timestamp"] field. If those two fields differ (outer ISO string vs. inner ms integer), the deduplication window will drift and either never suppress repeats or suppress valid new errors.
Sort key uses outer ISO string lexicographically: errors.sort(key=lambda x: x.get("timestamp", ""), ...) works only if all timestamps are zero-padded ISO-8601. Mixed formats (or missing values defaulting to "") will mis-sort.
loadPromptErrors declared twice: the function appears verbatim in both clawmetry/static/js/app.js and in the embedded JS block inside dashboard.py, with a subtle difference — app.js omits the var api = err.api || '' line that dashboard.py's copy has. Whichever is loaded last wins; the discrepancy should be unified to avoid future confusion.
_promptErrorPollTimer not cleared on tab leave: the switchTab handler starts the poll when entering overview but there is no else branch that clears it when navigating away (unlike the cron timer pattern). This means the 30s interval keeps firing regardless of which tab is active.
Banner never self-dismisses after errors resolve: once _promptErrorLastTs advances, the since filter will exclude all older events on the next poll. If the only errors are "old", errors.length === 0 and the banner hides — that part is correct. But if a user stays on the tab for a long session with no new errors, the initial errors shown at since=0 remain visible until a page refresh. Consider resetting _promptErrorLastTs = 0 on full loadAll() calls.
512KB tail read may split a JSONL line: if a very long entry straddles the size - 512000 seek point, line.strip() will yield a partial JSON string that json.loads silently skips. For correctness, discard only the first (potentially partial) line after the seek.

Issue link

Closes #601 (confirmed from PR body)

Generated by Claude Code

Adds visibility for provider-level errors (rate limits, auth issues, context overflow, model-not-found) that were previously hidden. - New /api/prompt-errors endpoint that scans session JSONL files for openclaw:prompt-error custom events - Red banner on Overview tab showing recent prompt errors with: provider, model, error type, timestamp - Auto-refreshes every 30s when on Overview tab - Deduplicates errors using timestamp tracking - [x] /api/prompt-errors endpoint filters for openclaw:prompt-error events - [x] Banner/alert section on Overview tab showing recent prompt errors - [x] Displays: provider, model, error type, timestamp

vivekchand · 2026-05-07T21:49:19Z

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

New /api/prompt-errors endpoint in routes/overview.py that scans session JSONL for openclaw:prompt-error custom events; adds a polling red-alert banner on the Overview tab (auto-refreshes every 30s)

Smoke commands

make test or make test-api
python3 dashboard.py --port 8900
curl -sS http://localhost:8900/api/prompt-errors → expect JSON list with provider, model, error_type, timestamp fields

What to look at visually

http://localhost:8900 → Overview tab — red alert banner should appear when any session JSONL contains openclaw:prompt-error events; absent when there are none (no empty-state flash)
Confirm the 30s auto-poll doesn't fire when the browser tab is backgrounded (check loadPromptErrors timer logic)

Likely failure modes from the diff

JSONL scanner on a workspace with 50+ sessions may be slow — worth a response-time sanity check
Timestamp-based deduplication: verify a session with the same timestamp but a different error type isn't silently dropped

Issue link

Closes [P0] Wire openclaw:prompt-error events into alerts stream #601 ✓

Generated by Claude Code

vivekchand mentioned this pull request Apr 26, 2026

feat: Wire openclaw:prompt-error events into alerts stream #788

Closed

vivekchand force-pushed the fix/gh-clawmetry-601-prompt-error-alerts branch from 60d6b93 to e2b2ea0 Compare May 1, 2026 07:04

vivekchand commented May 5, 2026

View reviewed changes

vivekchand commented May 6, 2026

View reviewed changes

vivekchand commented May 7, 2026

View reviewed changes

vivekchand force-pushed the fix/gh-clawmetry-601-prompt-error-alerts branch from e2b2ea0 to 973bb0a Compare May 7, 2026 07:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Wire openclaw:prompt-error events into alerts stream#797

feat: Wire openclaw:prompt-error events into alerts stream#797
vivekchand wants to merge 1 commit intomainfrom
fix/gh-clawmetry-601-prompt-error-alerts

vivekchand commented Apr 23, 2026

Uh oh!

vivekchand left a comment

Uh oh!

vivekchand left a comment

Uh oh!

vivekchand left a comment

Uh oh!

vivekchand left a comment

Uh oh!

vivekchand commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vivekchand commented Apr 23, 2026

What

How

Changes

Acceptance

Uh oh!

vivekchand left a comment

Choose a reason for hiding this comment

Test plan & review notes

Uh oh!

vivekchand left a comment

Choose a reason for hiding this comment

Test plan & review notes

Uh oh!

vivekchand left a comment

Choose a reason for hiding this comment

Test plan & review notes

Uh oh!

vivekchand left a comment

Choose a reason for hiding this comment

Test plan & review notes

Uh oh!

vivekchand commented May 7, 2026

Test plan & review notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant