Skip to content

fix(analytics): define top-session active duration as clamped idle gaps (alt to #867)#869

Open
mariusvniekerk wants to merge 8 commits into
mainfrom
feat/top-sessions-active-duration-clamp
Open

fix(analytics): define top-session active duration as clamped idle gaps (alt to #867)#869
mariusvniekerk wants to merge 8 commits into
mainfrom
feat/top-sessions-active-duration-clamp

Conversation

@mariusvniekerk

Copy link
Copy Markdown
Collaborator

Alternative to #867. This branch is built on @rodboev's commits (kept in
the history, so the original work and authorship are preserved); it changes
only how "active duration" is computed.

What changed

#867 computes a session's active duration by summing only the gaps that
follow a tool-use message. That captures tool-execution latency but
discards all model generation and thinking time — including the time spent
before the first tool call — so a session that reasons for minutes and then
fires one quick tool is scored as nearly idle.

This branch instead defines active duration the same way the existing
velocity "active minutes" metric already does: the sum of consecutive
inter-message gaps, with each gap capped at 5 minutes. Every gap counts —
model generation, tool execution, and quick human turnarounds — and only
stretches longer than the cap are bounded as idle.

Why

Message timestamps alone can't tell a 20-minute subagent run from a
20-minute coffee break; both are one long gap. Capping each gap fails
gracefully in both directions (a long idle gap and a long active gap each
contribute at most 5 minutes) instead of guessing, and it keeps the metric
robust to messy data: resumed sessions, machine sleep, and API stalls
produce the largest gaps, and those would otherwise dominate the sum.

It also unifies the definition. The 5-minute cap is hoisted to a single
shared constant (db.ActiveGapCapSec / ActiveGapCapMs) used by both the
velocity metric and Top Sessions, so the two "active" numbers on the
dashboard can't drift; a test asserts the seconds and milliseconds forms
agree.

Tradeoff

A genuinely long active operation — a 15-minute test run, a 20-minute
subagent — is also capped at 5 minutes, so it is undercounted. That is the
deliberate cost of not guessing whether a long gap was work or idle; the
error is bounded and symmetric.

Where to look

  • internal/db/analytics.go — shared constant, SQLite active-duration SQL,
    and the velocity metric now sourcing the same cap.
  • internal/db/timing.go — SQLite Go fallback (timezone-aware ranking path)
    running the same clamp.
  • internal/postgres/analytics.go, internal/duckdb/analytics_usage.go
    PostgreSQL and DuckDB twins.

All three SQL backends drop the has_tool_use filter and the trailing gap
to ended_at so the value matches the velocity computation.

generated by a clanker

rodboev and others added 8 commits June 25, 2026 18:27
Top Sessions "By Duration" introduced an active-duration metric that
summed only the gaps following a tool-use message. That captured
tool-execution latency but discarded all model generation and thinking
time, including the stretch before the first tool call, so it
systematically undercounted how long a session was actually worked on.

Redefine active duration as the sum of consecutive inter-message gaps
with each gap capped at five minutes, which is exactly the definition the
velocity "active minutes" metric already uses. A gap below the cap counts
in full (generation, tool execution, or a quick human turnaround); only
stretches longer than the cap are bounded as idle. Timestamps alone
cannot distinguish a long tool run from a human stepping away, so capping
fails gracefully in both directions instead of guessing.

Hoist the cap to a single shared constant (ActiveGapCapSec /
ActiveGapCapMs) so the two "active" definitions cannot drift, guarded by a
test that the seconds and milliseconds forms agree. The active-duration
SQL drops the has_tool_use filter and the trailing gap to ended_at to
match the velocity computation across SQLite, PostgreSQL, and DuckDB, and
the SQLite Go fallback runs the same clamp.

Alternative implementation to #83, built on its commits.
@roborev-ci

roborev-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

roborev: Combined Review (0915ffc)

Medium issue found.

Medium

  • internal/db/analytics.go:4535 - SQLite timezone fallback calls activeDurationMinForSession while the outer rows iterator remains open. This can hold one reader connection and acquire another per row; with a capped reader pool, concurrent duration fallback requests can exhaust the pool and block indefinitely. It also introduces an unbounded N+1 query pattern.
    • Fix: compute active duration in the outer query, or collect candidate row IDs first, close rows, then batch-load active durations before sorting.

Panel: ci_default_security | Synthesis: codex, 7s | Members: codex_default (codex/default, done, 6m12s), codex_security (codex/security, done, 1m7s) | Total: 7m26s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants