feat: Per-message cost attribution with cache-hit breakdown by vivekchand · Pull Request #779 · vivekchand/clawmetry

vivekchand · 2026-04-22T13:19:58Z

Closes vivekchand/clawmetry-cloud#318

What

Adds granular cache token tracking to the usage analytics:

Extended _extract_usage_metrics() to return 4-token-type + 4-cost-type breakdown
Updated _compute_transcript_analytics() to aggregate daily cache tokens
Enhanced /api/usage response with input/output/cache-read/cache-write token counts
New endpoint /api/token-attribution for per-message granular cost breakdown

How

Reads message.usage.cost {input, output, cacheRead, cacheWrite, total}
Tracks daily aggregate of each token type separately
Returns structured breakdown in API responses
Calculates cache hit ratio for visibility into cache efficiency

API Changes

/api/usage: Each day object now includes inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens
/api/token-attribution: New endpoint returning per-message token/cost breakdown with cache stats

…ivekchand/clawmetry-cloud#318) ## What Adds granular cache token tracking to the usage analytics: - Extended _extract_usage_metrics() to return 4-token-type + 4-cost-type breakdown - Updated _compute_transcript_analytics() to aggregate daily cache tokens - Enhanced /api/usage response with input/output/cache-read/cache-write token counts - New endpoint /api/token-attribution for per-message granular cost breakdown ## How - Reads message.usage.cost {input, output, cacheRead, cacheWrite, total} - Tracks daily aggregate of each token type separately - Returns structured breakdown in API responses - Calculates cache hit ratio for visibility into cache efficiency Closes vivekchand/clawmetry-cloud#318

vivekchand

Test plan & review notes

What changed

Extended _extract_usage_metrics() and _compute_transcript_analytics() to track 4-way token/cost breakdown (input, output, cache-read, cache-write); added new /api/token-attribution endpoint returning per-message granular cost with cache-hit ratio.

Smoke commands

make test or make test-api
python3 dashboard.py --port 8900 → transcript view — per-message cost column should appear
curl -sS http://localhost:8900/api/token-attribution?session_id=<session_id> — verify tokens, cost, and cache_hit_ratio fields in JSON
curl -sS http://localhost:8900/api/usage — each day object should now include inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens

What to look at visually

Message with cache hit vs. without — cache_hit_ratio should differ and cache_read tokens should be non-zero for cache-hit messages
Messages from sessions with no cost data — should degrade gracefully (zero-filled, not NaN or erroneous $0.00 where data is simply absent)
/api/token-attribution with no session_id param — scans up to 50 most-recent files; confirm it returns quickly on large session dirs

Likely failure modes from the diff

_extract_usage_metrics is duplicated in dashboard.py (two definitions, lines ~9723 and ~10090); both are patched identically — if they diverge later this will silently regress one code path. Worth consolidating.
cache_hit_ratio formula uses cache_read / (input_tok + cache_read): if a session only has cache_write tokens and zero input_tok + zero cache_read, the ratio is correctly 0.0 — but if upstream ever sends input_tokens=0 legitimately (e.g. tool-result-only messages), the ratio will be misleadingly 0.0 rather than absent.
Cost fields rely entirely on message.usage.cost.{input,output,cacheRead,cacheWrite} being present in the JSONL. If OpenClaw omits the cost sub-object (older sessions or non-Anthropic models), all four cost fields silently stay 0.0 with no indicator that cost data was unavailable — consider adding a cost_available: bool flag.
Floating-point accumulation in totals: summing hundreds of float values before the final round(..., 6) at the end can drift; decimal.Decimal or per-message rounding would be more precise for large sessions.
/api/token-attribution sorts all messages in memory before slicing to limit; for large session dirs (50 files × many messages) this could be slow — sorting key falls back to '' for missing timestamps, which will cluster all no-timestamp messages at the top after reverse=True.

Issue link

Branch is named feat/cache-token-breakdown-gh604 suggesting this addresses issue #604, but the PR body says Closes vivekchand/clawmetry-cloud#318 (a different repo). Please confirm whether this also closes #604 in this repo and update the body accordingly if so.

Overlap note

PR #863 ("prompt cache hit rate analytics") appears to cover adjacent ground. If that PR is open or recently merged, check for conflicts or duplication in how cache hit rate is computed and surfaced.

Generated by Claude Code

vivekchand

Test plan & review notes

What changed

Extends _extract_usage_metrics() and _compute_transcript_analytics() in dashboard.py to return per-type token/cost breakdown (input, output, cache-read, cache-write); enriches /api/usage day objects with inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens; adds new /api/token-attribution endpoint in routes/usage.py for per-message granular cost + cache-hit ratio.

Still relevant?

No merged PR found that supersedes this feature. PR #632 (merged 2026-04-15) was a mechanical refactor that moved bp_usage into routes/usage.py — this PR correctly targets routes/usage.py, so the branch base is consistent with current main. Feature is still absent from main; relevant to merge.
Note: _extract_usage_metrics is defined twice in dashboard.py (lines ~9723 and ~10051 based on the diff offsets). The diff patches both copies identically, which is correct, but the duplicate function itself is a pre-existing issue worth a follow-up cleanup.

Smoke commands

python3 dashboard.py --port 8900
curl -s http://localhost:8900/api/usage | python3 -m json.tool | grep -A4 '"inputTokens"'
curl -s "http://localhost:8900/api/token-attribution?limit=5" | python3 -m json.tool
curl -s "http://localhost:8900/api/token-attribution?session_id=<a_real_sid>&limit=10" | python3 -m json.tool

Likely failure modes

Sessions where message.usage.cost is a flat number (not a dict) will return zero for all granular cost fields — the elif isinstance(cost_data, (int, float)) branch only sets cost, leaving cost_input etc. at 0.0. Correct-but-silent; worth a comment.
/api/token-attribution scans up to 50 .jsonl files sorted by mtime with no caching — could be slow on large workspaces; consider a future TTL cache consistent with the rest of the analytics pipeline.
cache_hit_ratio formula excludes cache_write tokens from the denominator, which is intentional but differs from Anthropic's own definition — fine to leave as-is, just worth noting in docs.

Issue link

Closes vivekchand/clawmetry-cloud#318

Generated by Claude Code

vivekchand

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

_extract_usage_metrics() extended to return 4 token types (input/output/cache-read/cache-write) + matching cost types
_compute_transcript_analytics() aggregates daily cache tokens
/api/usage response extended with cache token breakdown fields
New GET /api/token-attribution endpoint for per-message granular cost

Smoke commands

python3 -c 'import ast; ast.parse(open("dashboard.py").read())' — syntax clean
curl -sS http://localhost:8900/api/usage | python3 -c "import sys,json; d=json.load(sys.stdin); print([k for k in d if 'cache' in k.lower()])" — cache_read/cache_write fields should appear
curl -sS http://localhost:8900/api/token-attribution — expect per-message breakdown array
Test against a session JSONL without message.usage.cost — must not crash; missing keys should fall back to 0

Likely failure modes from the diff

Older JSONL format without message.usage.cost must be handled with .get() — a bare key access would KeyError on historical sessions
Overlap watch: PR #863 (/api/cache-analytics) also computes cache hit rate from the same JSONL data — if both merge, keep an eye on the two endpoints returning diverging cache_hit_ratio values due to different scanning windows

Issue link

Closes vivekchand/clawmetry-cloud#318 (cross-repo OWNER/REPO#NUM syntax — correct)

Generated by Claude Code

vivekchand

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

_extract_usage_metrics() now returns a full 8-field breakdown (input/output/cache-read/cache-write tokens + costs); _compute_transcript_analytics() aggregates daily per-token-type totals; /api/usage exposes inputTokens/outputTokens/cacheReadTokens/cacheWriteTokens per day; new /api/token-attribution endpoint returns per-message granular cost + cache-hit ratio.

Smoke commands

make test
python3 dashboard.py --port 8900 then open a session transcript and inspect per-message cost column
curl -sS http://localhost:8900/api/transcript/<id> — verify cost breakdown fields in JSON
curl -sS "http://localhost:8900/api/token-attribution?session_id=<id>&limit=10" — confirm tokens, cost, and cache_hit_ratio fields are present and non-negative
curl -sS "http://localhost:8900/api/usage" — confirm each day object now contains inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens

What to look at visually

Transcript detail view: each message should show cost + cache-hit tokens
/api/usage day objects: all four new token-type fields should appear alongside existing tokens/cost

Likely failure modes from the diff

Duplicate function definition: _extract_usage_metrics is defined twice in dashboard.py (lines ~9723 and ~10090 in the diff); the second definition silently shadows the first. Both copies were patched identically, but this is a latent bug — the duplicate should be removed rather than kept in sync manually.
cache_hit_ratio divide-by-zero: cache_read / (input_tok + cache_read) is guarded, but only when (input_tok + cache_read) > 0; messages where both are zero are already skipped via total_tok == 0, so this is safe — but worth verifying with a zero-token edge case in tests.
camelCase vs snake_case field name mismatch: The _extract_usage_metrics path reads usage.get("cacheRead", ...) / usage.get("cacheWrite", ...) for token counts, but the new /api/token-attribution endpoint reads the same fields with the same keys directly from msg['usage']. If OpenClaw ever emits cache_read_tokens (snake) instead of cacheRead (camel) in the top-level usage dict, the token route picks it up via the fallback, but _extract_usage_metrics in dashboard.py reads cache_read / cache_write (from in_toks/out_toks context) — confirm these key names are consistent with actual session JSONL output.
Granular cost fields not always populated: When cost_data is a bare float (the elif branch), cost_input/output/cache_* stay 0.0 but the total cost is captured. The per-message breakdown in /api/token-attribution also falls to all-zeros for input_cost/output_cost in this case. Sessions without structured cost dicts will silently show $0.000 breakdown — a note or fallback estimate would help.
Multi-model sessions: msg.get('model', 'unknown') is surfaced per message, but the daily aggregates in _compute_transcript_analytics don't key by model, so a day with mixed models (e.g. Sonnet + Haiku) will sum cache tokens without indicating which model they came from. Not a bug for now, but could distort cache efficiency metrics if models are mixed.
limit applied after sorting full in-memory list: with 50 files × many messages the list can be large before slicing; consider breaking early once len(messages) >= limit during file iteration for large workspaces.

Issue link

Branch name suggests Closes #604 (local repo), but PR body says Closes vivekchand/clawmetry-cloud#318 — please confirm which issue(s) this closes and update the body if #604 is also intended.

Generated by Claude Code

vivekchand · 2026-05-07T21:51:20Z

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

_extract_usage_metrics() and _compute_transcript_analytics() extended with 4-token-type / 4-cost-type breakdown; /api/usage response gains inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens per day; new /api/token-attribution endpoint for per-message cost breakdown with cache hit ratio

Smoke commands

make test
python3 dashboard.py --port 8900
curl -sS http://localhost:8900/api/usage → each day object should include the 4 new token-type fields (not just the old aggregate)
curl -sS http://localhost:8900/api/token-attribution → per-message rows with cache_hit_ratio

Likely failure modes from the diff

Sessions predating cache-token tracking will have missing cacheRead/cacheWrite keys in their JSONL — confirm aggregation handles absent keys with 0 rather than a KeyError
Key name alignment: verify cacheRead / cacheWrite exactly matches what OpenClaw writes under message.usage.cost (case-sensitive)

Issue link

Closes vivekchand/clawmetry-cloud#318 ✓ (cross-repo issue — already uses full OWNER/REPO#NUM syntax in PR body, so it will auto-close on merge)

Generated by Claude Code

vivekchand force-pushed the feat/cache-token-breakdown-gh604 branch from bdd999e to 2f7b6ef Compare April 27, 2026 07:09

vivekchand commented May 5, 2026

View reviewed changes

vivekchand commented May 6, 2026

View reviewed changes

vivekchand mentioned this pull request May 6, 2026

feat: prompt cache hit rate analytics panel (closes #851) #863

Open

vivekchand commented May 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Per-message cost attribution with cache-hit breakdown#779

feat: Per-message cost attribution with cache-hit breakdown#779
vivekchand wants to merge 1 commit intomainfrom
feat/cache-token-breakdown-gh604

vivekchand commented Apr 22, 2026

Uh oh!

vivekchand left a comment

Uh oh!

vivekchand left a comment

Uh oh!

vivekchand left a comment

Uh oh!

vivekchand left a comment

Uh oh!

vivekchand commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vivekchand commented Apr 22, 2026

What

How

API Changes

Uh oh!

vivekchand left a comment

Choose a reason for hiding this comment

Test plan & review notes

Uh oh!

vivekchand left a comment

Choose a reason for hiding this comment

Test plan & review notes

Uh oh!

vivekchand left a comment

Choose a reason for hiding this comment

Test plan & review notes

Uh oh!

vivekchand left a comment

Choose a reason for hiding this comment

Test plan & review notes

Uh oh!

vivekchand commented May 7, 2026

Test plan & review notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant