Skip to content

fix(analytics): Count subagent sessions in analytics totals#873

Open
jesserobbins wants to merge 8 commits into
kenn-io:mainfrom
jesserobbins:feat/include-subagent-tokens-in-analytics
Open

fix(analytics): Count subagent sessions in analytics totals#873
jesserobbins wants to merge 8 commits into
kenn-io:mainfrom
jesserobbins:feat/include-subagent-tokens-in-analytics

Conversation

@jesserobbins

@jesserobbins jesserobbins commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

The Insights dashboard undercounts tokens and sessions because the analytics aggregates exclude subagent sessions. Workflow runs spawn many subagent transcripts; agentsview parses and parent-links them, but their tokens never reach the summary. The cost/usage view already counts them, so the two surfaces disagree. Closes #871.

Two filters hid them, and both had to change for the totals only:

  • The analytics WHERE clause excluded relationship_type IN ('subagent', 'fork'), the same predicate the session list uses to nest subagents under their parent. Right for navigation, wrong for token and session sums.
  • The summary defaults ExcludeOneShot on, and workflow subagents are all one-shot (one orchestrator prompt, one result), so they were re-excluded even once the relationship filter let them through.

This adds an opt-in IncludeSubagents flag on AnalyticsFilter, set only on GetAnalyticsSummary and GetAnalyticsProjects. Those surfaces count subagents and exempt them from the one-shot filter. Forks and continuations stay excluded everywhere, since their tokens overlap their root and would double-count. The flag defaults off, so distribution surfaces (session-shape, velocity, timing) keep the root-only filter unchanged; subagents average ~8 minutes against ~103 for a root session and would flatten any duration or shape histogram. Mirrored across SQLite, PostgreSQL, and DuckDB.

The stats command (v1 window analytics) had the same gap, and it's fixed here too. That pipeline loads sessions once and feeds both totals and distributions from them, so it loads a second subagent-inclusive set and overrides only the additive totals. The shape, velocity, timing, and human-vs-automation breakdowns stay root-only. Counting subagents in the session total while keeping them out of the human and automation buckets would have broken the sessions_all == human + automation invariant, so a third bucket, sessions_subagent, keeps the partition whole: the stats line now reads 14,205 (human 8,269, automation 1,117, subagent 4,819).

The navigation count and the dashboard count now differ on purpose. The sidebar shows the sessions you started; Insights counts the work the subagents did under them.

One gap left: cache_economics dollar figures in the stats output are still root-only. They come from a separate per-message query rather than a row sum, so folding subagents in cleanly is its own change. The token and session totals are the headline numbers and they're correct now; the cache dollars are a smaller follow-up.

Where to look: internal/db/analytics.go (RelationshipExclusionSQL, OneShotExclusionSQL), internal/db/session_stats.go (the two-row-set load and applySubagentInclusiveTotals), mirrored in internal/postgres/analytics.go and internal/duckdb/analytics_usage.go. The cross-surface split is asserted in internal/db/session_stats_test.go, internal/db/store_contract_test.go, and frontend/e2e/session-count-consistency.spec.ts.

Workflow subagents are synced and parent-linked but were excluded from
analytics token/session aggregates by the relationship filter, and
re-hidden by the one-shot exclusion (every workflow subagent is
one-shot: one orchestrator prompt yields one result). The Insights
summary therefore undercounted output tokens by ~24%.

Add AnalyticsFilter.IncludeSubagents (opt-in, default off) with
RelationshipExclusionSQL and OneShotExclusionSQL helpers, and set it on
the sum/count surfaces only: GetAnalyticsSummary and GetAnalyticsProjects
across SQLite, PostgreSQL, and DuckDB. Fork and continuation rows stay
excluded to avoid double-counting tokens that overlap their root.

Distribution surfaces (session-shape, velocity, hour-of-week, heatmap)
and all unlisted surfaces keep the original root-only filter, so short
one-shot subagents do not skew them. Session list, db-stats, and
navigation counts are unchanged.
The one-shot exemption is the path that surfaces workflow subagents
(all are one-shot). Add a store-contract summary assertion with
ExcludeOneShot set so the exemption runs against PostgreSQL in CI, and
a DuckDB SQL builder test guarding the hand-written relationship and
one-shot predicates and their column qualification.
The session-count-consistency E2E asserted all three views (list,
status bar, analytics summary) equal the root-only count. Analytics now
counts the fixture's 3 subagent sessions, so navigation stays at 9 while
the analytics summary shows 12. Assert the split rather than equality.
Assert exact one-shot-excluded summary totals (4 sessions / 450 output
tokens) so a regression that re-excludes the subagent child is caught;
the prior bounds were satisfied by the surviving root sessions. Correct
the IncludeSubagents doc comment to name the actual opt-in surfaces
(GetAnalyticsSummary, GetAnalyticsProjects) instead of a nonexistent
window-stats surface.
The v1 window stats loaded one root-only row set and fed it to every
consumer, so the stats-command token/session totals excluded subagents
the same way the analytics summary did. The pipeline can't flip the
filter at the query because the same rows drive both additive totals and
the distribution/shape and human-vs-automation breakdowns.

Load a second subagent-inclusive row set and override only the additive
totals (SessionsAll, MessagesTotal, UserMessagesTotal) from it. Every
other consumer keeps the root-only rows, so distributions, velocity,
timing, archetypes, and the human/automation split are unchanged. A
subagent is not a human or automation session, so SessionsHuman and
SessionsAutomation stay root-only too. Forks stay excluded everywhere.
Counting subagents in SessionsAll while keeping them out of
SessionsHuman and SessionsAutomation broke the documented invariant
sessions_all == human + automation. The CLI line showed a total that
no longer matched its parenthetical breakdown.

Add a third bucket, sessions_subagent, so the three partition
SessionsAll: human + automation + subagent. It's an additive v1 field
(no schema bump), surfaced in the stats command's Sessions line and the
golden output.
…icate

Extract a package-level RelationshipExclusionSQL(includeSubagents,
colPrefix) that the analytics builders and the stats pipeline both call,
so the subagent/fork predicate is defined once. The colPrefix argument
qualifies the column for the DuckDB builder, replacing the string-surgery
that depended on the predicate starting with a literal column name. The
stats loadSessionsInWindow now uses the same helper instead of its own
copy of the predicate strings.
@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (c249ee8)

No issues found.


Panel: ci_default_security | Synthesis: codex | Members: codex_default (codex/default, done, 5m28s), codex_security (codex/security, done, 58s) | Total: 6m26s

…-tokens-in-analytics

# Conflicts:
#	frontend/e2e/session-count-consistency.spec.ts
#	internal/duckdb/analytics_usage.go
#	internal/duckdb/analytics_usage_test.go
@jesserobbins jesserobbins marked this pull request as ready for review June 26, 2026 05:31
@jesserobbins jesserobbins changed the title Count subagent sessions in analytics totals fix(analytics): Count subagent sessions in analytics totals Jun 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Analytics dashboard undercounts tokens and sessions by excluding subagents

1 participant