Skip to content

v2.4.4 — token-throughput display + metric-correctness fixes#5

Merged
soumyadebroy3 merged 2 commits into
mainfrom
release/v2.4.4
Jun 1, 2026
Merged

v2.4.4 — token-throughput display + metric-correctness fixes#5
soumyadebroy3 merged 2 commits into
mainfrom
release/v2.4.4

Conversation

@soumyadebroy3

Copy link
Copy Markdown
Owner

Summary

The menubar/HTML/CLI reported token usage as input + output only, while cost billed all four token buckets. On cache-heavy agentic runs that made the headline "tok" ~2–5% of real throughput (a ~1.1M-token workflow read as ~57K) and cost-per-"token" look absurd. This release puts the token figure on the same basis cost already uses — total throughput = input + output + cache read + cache write — and fixes a sweep of related metric-correctness issues. Bumps version to 2.4.4.

Token display (core fix)

  • Hero, trend, and HTML headline now show total throughput, matching the Anthropic usage object and the Claude Code / Warp counters. Cache volume is forwarded in the menubar payload (current.cacheReadTokens / cacheWriteTokens).
  • Applied across macOS (Swift), Windows (React), and GNOME (JS).
  • Hero ↑/↓ split now follows the conventional in↑ / out↓ direction; effectiveTokens proxy weights cache-write at 1.25× (was 1×).

Metric-correctness

  • Cache-hit rate uses one denominator everywhere (input + cacheRead + cacheWrite); menubar/HTML previously dropped cache writes and disagreed with the TUI.
  • Trend "% vs prior" tracks the token delta when tokens are shown (was always cost-based).
  • "1-shot" rate exposes its sample size (e.g. 0/1) instead of a bare "100%" over a hidden denominator of 1.
  • "Yesterday" shows "—" for a no-data day instead of a false "0".
  • Last 7 Days / Last 30 Days span 7 / 30 calendar days, not 8 / 31.

Forecast / leverage / retry

  • Forecast "On pace for" projects from completed days only (today's partial day no longer drags it low); "Avg/day" + week-over-week use completed-day windows; day 1 no longer false-fires the overspend tip.
  • Leverage is period-normalized (single-period value scaled to a 30-day run-rate before comparing to the monthly price); zero-paid 999× sentinel removed; HTML banner names the actual window.
  • Retry detection counts a re-edit after any intervening check (not only Bash) while not flagging sequential multi-file edits; retry-tax labeled a worst-case estimate.

Pricing

  • claude-opus-4-8 added to MANUAL_ENTRIES ($5/$25 per M) so offline builds don't fall back to the legacy claude-opus-4 rate ($15/$75) and overstate Opus spend ~2.6×.

Not included

  • Long-context (>200K) pricing tier — no published Opus-1M premium rate to apply without inventing numbers; intentionally deferred.

Verification

  • 788 tests pass (incl. 6 new retry-detection tests; updated cli-date + snapshot-contract tests).
  • TS typecheck clean · macOS Swift build ✅ · Windows tsc ✅ · GNOME syntax ✅.
  • CLI smoke: today's hero went from a misleading 336K tok to a truthful 30M tok, reconciling with the billed cost (99.9% cache hit).

…nubar/HTML/CLI

Token counts now reflect real throughput (input + output + cache read +
cache write) instead of input+output-only -- on cache-heavy agentic runs the
old figure was ~2-5% of actual usage while cost billed all four buckets, so
cost-per-"token" looked absurd and a 1.1M-token workflow read as ~57K. Applied
across the macOS menubar, Windows app, GNOME extension, and HTML report; cache
volume is forwarded in the menubar payload (current.cacheReadTokens/cacheWriteTokens).

Also a sweep of metric-correctness fixes:
- cache-hit rate uses one denominator everywhere (input+cacheRead+cacheWrite)
- trend "% vs prior" tracks the token metric when tokens are shown
- "1-shot" rate exposes its sample size; "Yesterday" distinguishes 0 from no-data
- Last 7/30 Days ranges span 7/30 calendar days, not 8/31
- forecast projects from completed days only; week-over-week uses equal windows
- leverage is period-normalized; zero-paid 999x sentinel removed
- retry detection counts re-edits after any intervening check (not just Bash)
- claude-opus-4-8 pricing added to the bundled snapshot (offline fallback)
- hero token up/down split follows the in-up / out-down convention

788 tests pass (incl. 6 new retry-detection tests); macOS Swift + Windows
React + GNOME all build/typecheck clean.
@soumyadebroy3 soumyadebroy3 merged commit 9514982 into main Jun 1, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant