feat: Per-message cost attribution with cache-hit breakdown#779
feat: Per-message cost attribution with cache-hit breakdown#779vivekchand wants to merge 1 commit intomainfrom
Conversation
…ivekchand/clawmetry-cloud#318) ## What Adds granular cache token tracking to the usage analytics: - Extended _extract_usage_metrics() to return 4-token-type + 4-cost-type breakdown - Updated _compute_transcript_analytics() to aggregate daily cache tokens - Enhanced /api/usage response with input/output/cache-read/cache-write token counts - New endpoint /api/token-attribution for per-message granular cost breakdown ## How - Reads message.usage.cost {input, output, cacheRead, cacheWrite, total} - Tracks daily aggregate of each token type separately - Returns structured breakdown in API responses - Calculates cache hit ratio for visibility into cache efficiency Closes vivekchand/clawmetry-cloud#318
bdd999e to
2f7b6ef
Compare
vivekchand
left a comment
There was a problem hiding this comment.
Test plan & review notes
What changed
- Extended
_extract_usage_metrics()and_compute_transcript_analytics()to track 4-way token/cost breakdown (input, output, cache-read, cache-write); added new/api/token-attributionendpoint returning per-message granular cost with cache-hit ratio.
Smoke commands
make testormake test-apipython3 dashboard.py --port 8900→ transcript view — per-message cost column should appearcurl -sS http://localhost:8900/api/token-attribution?session_id=<session_id>— verifytokens,cost, andcache_hit_ratiofields in JSONcurl -sS http://localhost:8900/api/usage— each day object should now includeinputTokens,outputTokens,cacheReadTokens,cacheWriteTokens
What to look at visually
- Message with cache hit vs. without —
cache_hit_ratioshould differ andcache_readtokens should be non-zero for cache-hit messages - Messages from sessions with no cost data — should degrade gracefully (zero-filled, not NaN or erroneous $0.00 where data is simply absent)
/api/token-attributionwith nosession_idparam — scans up to 50 most-recent files; confirm it returns quickly on large session dirs
Likely failure modes from the diff
_extract_usage_metricsis duplicated indashboard.py(two definitions, lines ~9723 and ~10090); both are patched identically — if they diverge later this will silently regress one code path. Worth consolidating.cache_hit_ratioformula usescache_read / (input_tok + cache_read): if a session only hascache_writetokens and zeroinput_tok+ zerocache_read, the ratio is correctly 0.0 — but if upstream ever sendsinput_tokens=0legitimately (e.g. tool-result-only messages), the ratio will be misleadingly 0.0 rather than absent.- Cost fields rely entirely on
message.usage.cost.{input,output,cacheRead,cacheWrite}being present in the JSONL. If OpenClaw omits thecostsub-object (older sessions or non-Anthropic models), all four cost fields silently stay 0.0 with no indicator that cost data was unavailable — consider adding acost_available: boolflag. - Floating-point accumulation in
totals: summing hundreds offloatvalues before the finalround(..., 6)at the end can drift;decimal.Decimalor per-message rounding would be more precise for large sessions. /api/token-attributionsorts all messages in memory before slicing tolimit; for large session dirs (50 files × many messages) this could be slow — sorting key falls back to''for missing timestamps, which will cluster all no-timestamp messages at the top afterreverse=True.
Issue link
- Branch is named
feat/cache-token-breakdown-gh604suggesting this addresses issue #604, but the PR body saysCloses vivekchand/clawmetry-cloud#318(a different repo). Please confirm whether this also closes #604 in this repo and update the body accordingly if so.
Overlap note
- PR #863 ("prompt cache hit rate analytics") appears to cover adjacent ground. If that PR is open or recently merged, check for conflicts or duplication in how cache hit rate is computed and surfaced.
Generated by Claude Code
vivekchand
left a comment
There was a problem hiding this comment.
Test plan & review notes
What changed
- Extends
_extract_usage_metrics()and_compute_transcript_analytics()indashboard.pyto return per-type token/cost breakdown (input, output, cache-read, cache-write); enriches/api/usageday objects withinputTokens,outputTokens,cacheReadTokens,cacheWriteTokens; adds new/api/token-attributionendpoint inroutes/usage.pyfor per-message granular cost + cache-hit ratio.
Still relevant?
- No merged PR found that supersedes this feature. PR #632 (merged 2026-04-15) was a mechanical refactor that moved
bp_usageintoroutes/usage.py— this PR correctly targetsroutes/usage.py, so the branch base is consistent with current main. Feature is still absent from main; relevant to merge. - Note:
_extract_usage_metricsis defined twice indashboard.py(lines ~9723 and ~10051 based on the diff offsets). The diff patches both copies identically, which is correct, but the duplicate function itself is a pre-existing issue worth a follow-up cleanup.
Smoke commands
python3 dashboard.py --port 8900curl -s http://localhost:8900/api/usage | python3 -m json.tool | grep -A4 '"inputTokens"'curl -s "http://localhost:8900/api/token-attribution?limit=5" | python3 -m json.toolcurl -s "http://localhost:8900/api/token-attribution?session_id=<a_real_sid>&limit=10" | python3 -m json.tool
Likely failure modes
- Sessions where
message.usage.costis a flat number (not a dict) will return zero for all granular cost fields — theelif isinstance(cost_data, (int, float))branch only setscost, leavingcost_inputetc. at 0.0. Correct-but-silent; worth a comment. /api/token-attributionscans up to 50.jsonlfiles sorted bymtimewith no caching — could be slow on large workspaces; consider a future TTL cache consistent with the rest of the analytics pipeline.cache_hit_ratioformula excludescache_writetokens from the denominator, which is intentional but differs from Anthropic's own definition — fine to leave as-is, just worth noting in docs.
Issue link
- Closes vivekchand/clawmetry-cloud#318
Generated by Claude Code
vivekchand
left a comment
There was a problem hiding this comment.
Test plan & review notes
Repo: vivekchand/clawmetry
What changed
_extract_usage_metrics()extended to return 4 token types (input/output/cache-read/cache-write) + matching cost types_compute_transcript_analytics()aggregates daily cache tokens/api/usageresponse extended with cache token breakdown fields- New
GET /api/token-attributionendpoint for per-message granular cost
Smoke commands
python3 -c 'import ast; ast.parse(open("dashboard.py").read())'— syntax cleancurl -sS http://localhost:8900/api/usage | python3 -c "import sys,json; d=json.load(sys.stdin); print([k for k in d if 'cache' in k.lower()])"— cache_read/cache_write fields should appearcurl -sS http://localhost:8900/api/token-attribution— expect per-message breakdown array- Test against a session JSONL without
message.usage.cost— must not crash; missing keys should fall back to 0
Likely failure modes from the diff
- Older JSONL format without
message.usage.costmust be handled with.get()— a bare key access wouldKeyErroron historical sessions - Overlap watch: PR #863 (
/api/cache-analytics) also computes cache hit rate from the same JSONL data — if both merge, keep an eye on the two endpoints returning divergingcache_hit_ratiovalues due to different scanning windows
Issue link
- Closes vivekchand/clawmetry-cloud#318 (cross-repo
OWNER/REPO#NUMsyntax — correct)
Generated by Claude Code
vivekchand
left a comment
There was a problem hiding this comment.
Test plan & review notes
Repo: vivekchand/clawmetry
What changed
_extract_usage_metrics()now returns a full 8-field breakdown (input/output/cache-read/cache-write tokens + costs);_compute_transcript_analytics()aggregates daily per-token-type totals;/api/usageexposesinputTokens/outputTokens/cacheReadTokens/cacheWriteTokensper day; new/api/token-attributionendpoint returns per-message granular cost + cache-hit ratio.
Smoke commands
make testpython3 dashboard.py --port 8900then open a session transcript and inspect per-message cost columncurl -sS http://localhost:8900/api/transcript/<id>— verify cost breakdown fields in JSONcurl -sS "http://localhost:8900/api/token-attribution?session_id=<id>&limit=10"— confirmtokens,cost, andcache_hit_ratiofields are present and non-negativecurl -sS "http://localhost:8900/api/usage"— confirm each day object now containsinputTokens,outputTokens,cacheReadTokens,cacheWriteTokens
What to look at visually
- Transcript detail view: each message should show cost + cache-hit tokens
/api/usageday objects: all four new token-type fields should appear alongside existingtokens/cost
Likely failure modes from the diff
- Duplicate function definition:
_extract_usage_metricsis defined twice indashboard.py(lines ~9723 and ~10090 in the diff); the second definition silently shadows the first. Both copies were patched identically, but this is a latent bug — the duplicate should be removed rather than kept in sync manually. cache_hit_ratiodivide-by-zero:cache_read / (input_tok + cache_read)is guarded, but only when(input_tok + cache_read) > 0; messages where both are zero are already skipped viatotal_tok == 0, so this is safe — but worth verifying with a zero-token edge case in tests.- camelCase vs snake_case field name mismatch: The
_extract_usage_metricspath readsusage.get("cacheRead", ...)/usage.get("cacheWrite", ...)for token counts, but the new/api/token-attributionendpoint reads the same fields with the same keys directly frommsg['usage']. If OpenClaw ever emitscache_read_tokens(snake) instead ofcacheRead(camel) in the top-level usage dict, the token route picks it up via the fallback, but_extract_usage_metricsindashboard.pyreadscache_read/cache_write(fromin_toks/out_tokscontext) — confirm these key names are consistent with actual session JSONL output. - Granular cost fields not always populated: When
cost_datais a bare float (theelifbranch),cost_input/output/cache_*stay 0.0 but the totalcostis captured. The per-message breakdown in/api/token-attributionalso falls to all-zeros forinput_cost/output_costin this case. Sessions without structuredcostdicts will silently show $0.000 breakdown — a note or fallback estimate would help. - Multi-model sessions:
msg.get('model', 'unknown')is surfaced per message, but the daily aggregates in_compute_transcript_analyticsdon't key by model, so a day with mixed models (e.g. Sonnet + Haiku) will sum cache tokens without indicating which model they came from. Not a bug for now, but could distort cache efficiency metrics if models are mixed. limitapplied after sorting full in-memory list: with 50 files × many messages the list can be large before slicing; consider breaking early oncelen(messages) >= limitduring file iteration for large workspaces.
Issue link
- Branch name suggests Closes #604 (local repo), but PR body says
Closes vivekchand/clawmetry-cloud#318— please confirm which issue(s) this closes and update the body if #604 is also intended.
Generated by Claude Code
Test plan & review notesRepo: vivekchand/clawmetry What changed
Smoke commands
Likely failure modes from the diff
Issue link
Generated by Claude Code |
Closes vivekchand/clawmetry-cloud#318
What
Adds granular cache token tracking to the usage analytics:
How
API Changes