Skip to content

feat: Per-message cost attribution with cache-hit breakdown#779

Open
vivekchand wants to merge 1 commit intomainfrom
feat/cache-token-breakdown-gh604
Open

feat: Per-message cost attribution with cache-hit breakdown#779
vivekchand wants to merge 1 commit intomainfrom
feat/cache-token-breakdown-gh604

Conversation

@vivekchand
Copy link
Copy Markdown
Owner

Closes vivekchand/clawmetry-cloud#318

What

Adds granular cache token tracking to the usage analytics:

  • Extended _extract_usage_metrics() to return 4-token-type + 4-cost-type breakdown
  • Updated _compute_transcript_analytics() to aggregate daily cache tokens
  • Enhanced /api/usage response with input/output/cache-read/cache-write token counts
  • New endpoint /api/token-attribution for per-message granular cost breakdown

How

  • Reads message.usage.cost {input, output, cacheRead, cacheWrite, total}
  • Tracks daily aggregate of each token type separately
  • Returns structured breakdown in API responses
  • Calculates cache hit ratio for visibility into cache efficiency

API Changes

  • /api/usage: Each day object now includes inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens
  • /api/token-attribution: New endpoint returning per-message token/cost breakdown with cache stats

…ivekchand/clawmetry-cloud#318)

## What
Adds granular cache token tracking to the usage analytics:
- Extended _extract_usage_metrics() to return 4-token-type + 4-cost-type breakdown
- Updated _compute_transcript_analytics() to aggregate daily cache tokens
- Enhanced /api/usage response with input/output/cache-read/cache-write token counts
- New endpoint /api/token-attribution for per-message granular cost breakdown

## How
- Reads message.usage.cost {input, output, cacheRead, cacheWrite, total}
- Tracks daily aggregate of each token type separately
- Returns structured breakdown in API responses
- Calculates cache hit ratio for visibility into cache efficiency

Closes vivekchand/clawmetry-cloud#318
@vivekchand vivekchand force-pushed the feat/cache-token-breakdown-gh604 branch from bdd999e to 2f7b6ef Compare April 27, 2026 07:09
Copy link
Copy Markdown
Owner Author

@vivekchand vivekchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan & review notes

What changed

  • Extended _extract_usage_metrics() and _compute_transcript_analytics() to track 4-way token/cost breakdown (input, output, cache-read, cache-write); added new /api/token-attribution endpoint returning per-message granular cost with cache-hit ratio.

Smoke commands

  • make test or make test-api
  • python3 dashboard.py --port 8900 → transcript view — per-message cost column should appear
  • curl -sS http://localhost:8900/api/token-attribution?session_id=<session_id> — verify tokens, cost, and cache_hit_ratio fields in JSON
  • curl -sS http://localhost:8900/api/usage — each day object should now include inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens

What to look at visually

  • Message with cache hit vs. without — cache_hit_ratio should differ and cache_read tokens should be non-zero for cache-hit messages
  • Messages from sessions with no cost data — should degrade gracefully (zero-filled, not NaN or erroneous $0.00 where data is simply absent)
  • /api/token-attribution with no session_id param — scans up to 50 most-recent files; confirm it returns quickly on large session dirs

Likely failure modes from the diff

  • _extract_usage_metrics is duplicated in dashboard.py (two definitions, lines ~9723 and ~10090); both are patched identically — if they diverge later this will silently regress one code path. Worth consolidating.
  • cache_hit_ratio formula uses cache_read / (input_tok + cache_read): if a session only has cache_write tokens and zero input_tok + zero cache_read, the ratio is correctly 0.0 — but if upstream ever sends input_tokens=0 legitimately (e.g. tool-result-only messages), the ratio will be misleadingly 0.0 rather than absent.
  • Cost fields rely entirely on message.usage.cost.{input,output,cacheRead,cacheWrite} being present in the JSONL. If OpenClaw omits the cost sub-object (older sessions or non-Anthropic models), all four cost fields silently stay 0.0 with no indicator that cost data was unavailable — consider adding a cost_available: bool flag.
  • Floating-point accumulation in totals: summing hundreds of float values before the final round(..., 6) at the end can drift; decimal.Decimal or per-message rounding would be more precise for large sessions.
  • /api/token-attribution sorts all messages in memory before slicing to limit; for large session dirs (50 files × many messages) this could be slow — sorting key falls back to '' for missing timestamps, which will cluster all no-timestamp messages at the top after reverse=True.

Issue link

  • Branch is named feat/cache-token-breakdown-gh604 suggesting this addresses issue #604, but the PR body says Closes vivekchand/clawmetry-cloud#318 (a different repo). Please confirm whether this also closes #604 in this repo and update the body accordingly if so.

Overlap note

  • PR #863 ("prompt cache hit rate analytics") appears to cover adjacent ground. If that PR is open or recently merged, check for conflicts or duplication in how cache hit rate is computed and surfaced.

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@vivekchand vivekchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan & review notes

What changed

  • Extends _extract_usage_metrics() and _compute_transcript_analytics() in dashboard.py to return per-type token/cost breakdown (input, output, cache-read, cache-write); enriches /api/usage day objects with inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens; adds new /api/token-attribution endpoint in routes/usage.py for per-message granular cost + cache-hit ratio.

Still relevant?

  • No merged PR found that supersedes this feature. PR #632 (merged 2026-04-15) was a mechanical refactor that moved bp_usage into routes/usage.py — this PR correctly targets routes/usage.py, so the branch base is consistent with current main. Feature is still absent from main; relevant to merge.
  • Note: _extract_usage_metrics is defined twice in dashboard.py (lines ~9723 and ~10051 based on the diff offsets). The diff patches both copies identically, which is correct, but the duplicate function itself is a pre-existing issue worth a follow-up cleanup.

Smoke commands

  • python3 dashboard.py --port 8900
  • curl -s http://localhost:8900/api/usage | python3 -m json.tool | grep -A4 '"inputTokens"'
  • curl -s "http://localhost:8900/api/token-attribution?limit=5" | python3 -m json.tool
  • curl -s "http://localhost:8900/api/token-attribution?session_id=<a_real_sid>&limit=10" | python3 -m json.tool

Likely failure modes

  • Sessions where message.usage.cost is a flat number (not a dict) will return zero for all granular cost fields — the elif isinstance(cost_data, (int, float)) branch only sets cost, leaving cost_input etc. at 0.0. Correct-but-silent; worth a comment.
  • /api/token-attribution scans up to 50 .jsonl files sorted by mtime with no caching — could be slow on large workspaces; consider a future TTL cache consistent with the rest of the analytics pipeline.
  • cache_hit_ratio formula excludes cache_write tokens from the denominator, which is intentional but differs from Anthropic's own definition — fine to leave as-is, just worth noting in docs.

Issue link

  • Closes vivekchand/clawmetry-cloud#318

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@vivekchand vivekchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

  • _extract_usage_metrics() extended to return 4 token types (input/output/cache-read/cache-write) + matching cost types
  • _compute_transcript_analytics() aggregates daily cache tokens
  • /api/usage response extended with cache token breakdown fields
  • New GET /api/token-attribution endpoint for per-message granular cost

Smoke commands

  • python3 -c 'import ast; ast.parse(open("dashboard.py").read())' — syntax clean
  • curl -sS http://localhost:8900/api/usage | python3 -c "import sys,json; d=json.load(sys.stdin); print([k for k in d if 'cache' in k.lower()])" — cache_read/cache_write fields should appear
  • curl -sS http://localhost:8900/api/token-attribution — expect per-message breakdown array
  • Test against a session JSONL without message.usage.cost — must not crash; missing keys should fall back to 0

Likely failure modes from the diff

  • Older JSONL format without message.usage.cost must be handled with .get() — a bare key access would KeyError on historical sessions
  • Overlap watch: PR #863 (/api/cache-analytics) also computes cache hit rate from the same JSONL data — if both merge, keep an eye on the two endpoints returning diverging cache_hit_ratio values due to different scanning windows

Issue link

  • Closes vivekchand/clawmetry-cloud#318 (cross-repo OWNER/REPO#NUM syntax — correct)

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

@vivekchand vivekchand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

  • _extract_usage_metrics() now returns a full 8-field breakdown (input/output/cache-read/cache-write tokens + costs); _compute_transcript_analytics() aggregates daily per-token-type totals; /api/usage exposes inputTokens/outputTokens/cacheReadTokens/cacheWriteTokens per day; new /api/token-attribution endpoint returns per-message granular cost + cache-hit ratio.

Smoke commands

  • make test
  • python3 dashboard.py --port 8900 then open a session transcript and inspect per-message cost column
  • curl -sS http://localhost:8900/api/transcript/<id> — verify cost breakdown fields in JSON
  • curl -sS "http://localhost:8900/api/token-attribution?session_id=<id>&limit=10" — confirm tokens, cost, and cache_hit_ratio fields are present and non-negative
  • curl -sS "http://localhost:8900/api/usage" — confirm each day object now contains inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens

What to look at visually

  • Transcript detail view: each message should show cost + cache-hit tokens
  • /api/usage day objects: all four new token-type fields should appear alongside existing tokens/cost

Likely failure modes from the diff

  • Duplicate function definition: _extract_usage_metrics is defined twice in dashboard.py (lines ~9723 and ~10090 in the diff); the second definition silently shadows the first. Both copies were patched identically, but this is a latent bug — the duplicate should be removed rather than kept in sync manually.
  • cache_hit_ratio divide-by-zero: cache_read / (input_tok + cache_read) is guarded, but only when (input_tok + cache_read) > 0; messages where both are zero are already skipped via total_tok == 0, so this is safe — but worth verifying with a zero-token edge case in tests.
  • camelCase vs snake_case field name mismatch: The _extract_usage_metrics path reads usage.get("cacheRead", ...) / usage.get("cacheWrite", ...) for token counts, but the new /api/token-attribution endpoint reads the same fields with the same keys directly from msg['usage']. If OpenClaw ever emits cache_read_tokens (snake) instead of cacheRead (camel) in the top-level usage dict, the token route picks it up via the fallback, but _extract_usage_metrics in dashboard.py reads cache_read / cache_write (from in_toks/out_toks context) — confirm these key names are consistent with actual session JSONL output.
  • Granular cost fields not always populated: When cost_data is a bare float (the elif branch), cost_input/output/cache_* stay 0.0 but the total cost is captured. The per-message breakdown in /api/token-attribution also falls to all-zeros for input_cost/output_cost in this case. Sessions without structured cost dicts will silently show $0.000 breakdown — a note or fallback estimate would help.
  • Multi-model sessions: msg.get('model', 'unknown') is surfaced per message, but the daily aggregates in _compute_transcript_analytics don't key by model, so a day with mixed models (e.g. Sonnet + Haiku) will sum cache tokens without indicating which model they came from. Not a bug for now, but could distort cache efficiency metrics if models are mixed.
  • limit applied after sorting full in-memory list: with 50 files × many messages the list can be large before slicing; consider breaking early once len(messages) >= limit during file iteration for large workspaces.

Issue link

  • Branch name suggests Closes #604 (local repo), but PR body says Closes vivekchand/clawmetry-cloud#318 — please confirm which issue(s) this closes and update the body if #604 is also intended.

Generated by Claude Code

Copy link
Copy Markdown
Owner Author

Test plan & review notes

Repo: vivekchand/clawmetry

What changed

  • _extract_usage_metrics() and _compute_transcript_analytics() extended with 4-token-type / 4-cost-type breakdown; /api/usage response gains inputTokens, outputTokens, cacheReadTokens, cacheWriteTokens per day; new /api/token-attribution endpoint for per-message cost breakdown with cache hit ratio

Smoke commands

  • make test
  • python3 dashboard.py --port 8900
  • curl -sS http://localhost:8900/api/usage → each day object should include the 4 new token-type fields (not just the old aggregate)
  • curl -sS http://localhost:8900/api/token-attribution → per-message rows with cache_hit_ratio

Likely failure modes from the diff

  • Sessions predating cache-token tracking will have missing cacheRead/cacheWrite keys in their JSONL — confirm aggregation handles absent keys with 0 rather than a KeyError
  • Key name alignment: verify cacheRead / cacheWrite exactly matches what OpenClaw writes under message.usage.cost (case-sensitive)

Issue link

  • Closes vivekchand/clawmetry-cloud#318 ✓ (cross-repo issue — already uses full OWNER/REPO#NUM syntax in PR body, so it will auto-close on merge)

Generated by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant