fix(dashboard): raise p95 sample threshold to 10 turns with p50 fallback#96
Open
nicolotognoni wants to merge 1 commit into
Open
Conversation
PR-#82's 5 → 2 turn lowering reunited the per-call detail pane with the call-list column, but on a live n=5 turn call the headline became "p95=1977ms" while p50 was only 309ms — the 95th percentile collapses to the slowest single sample at low n and stops being a tail estimate. Threshold is now 10 turns everywhere (LatencyPanel, MetricsPanel, CallTable, Metric tooltip, App "Avg latency p95" card). Below the threshold every surface falls back to p50 — robust to a single outlier — and labels the cell so the user knows why. App-level "Avg latency p95" additionally requires ≥3 qualifying calls before showing a number; otherwise the card renders "—" instead of a polluted average. The four exported MIN_TURNS_* constants are kept in lockstep so the threshold is single-sourced. Bundle re-synced to both SDKs via dashboard-app/scripts/sync.mjs. New tests: src/App.test.ts (8) covers avgP95 gating + bucketHeadline fallback. dashboard-app vitest: 16/16 pass. libraries/typescript vitest: 1516/1516 pass. tsc --noEmit clean on both packages.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implementation
MIN_TURNS_FOR_PERCENTILES = 10exported fromLatencyPanel.tsxandMetricsPanel.tsx;MIN_TURNS_FOR_P95_COLUMN = 10fromCallTable.tsx;MIN_TURNS_FOR_AVG_P95 = 10+MIN_CALLS_FOR_AVG_P95 = 3fromApp.tsx. Single source per surface so the threshold can't drift.LatencyPanel, bothlatencyview branches ofMetricsPanel): below the threshold the p95 box rendersp50 (n<10)instead, with a tooltip and footer line spelling out the rule. Realtime calls (single-bucket waterfall) and pipeline calls (stt/llm/tts waterfall) both covered.CallTable.tsx): column renamed "p95 latency" → "Latency" (since it now reports either statistic); rows withturnCount < 10show<ms> (p50); column header has a tooltip explaining the fallback.Metric.tsxbucketHeadline): when no call in the bucket has ≥10 turns, the headline readsAVG LATENCY n/a (n<10 turns)rather than a fake "0 ms".App.tsx):avgP95()filters out calls with <10 turns, requires ≥3 qualifying calls, returns 0 otherwise. The card then renders "—" instead of "0 ms" so the empty state is unambiguous.dashboard-app/src/App.test.ts(8 cases) coversavgP95gating andbucketHeadlinefallback.vite build) and re-synced the inlined bundle intolibraries/{typescript,python}/.../dashboard/ui.htmlviadashboard-app/scripts/sync.mjsso both SDKs ship the updated UI.Breaking change?
No. The thresholds are internal to the dashboard SPA; SDK API surface is untouched. Users with long calls (≥10 turns) see no change; users with short calls now see honest p50 numbers instead of a noisy p95.
Test plan
npm test -- --run→ 16/16 pass (8 new + 8 existing)npm run lint(tsc --noEmit) cleannpm run buildsucceeds, 208 kB bundle syncednpm test -- --run→ 1516/1516 passnpm run lintcleanp50 (n<10)with a tooltipDocs updates
CHANGELOG.mdentry added under## 0.6.1 (2026-05-12).