fix(dashboard): raise p95 sample threshold to 10 turns with p50 fallback by nicolotognoni · Pull Request #96 · PatterAI/Patter

nicolotognoni · 2026-05-12T16:14:25Z

Summary

PR-feat(observability): emit patter.cost.* and patter.latency.* OTel span attributes #82's lowered threshold (5 → 2 turns) gave parity with the call-list column but produced a misleading headline on short calls. Live test: n=5 turns → p95=1977ms while p50=309ms, because at n<10 the 95th-percentile collapses to "slowest single turn" and stops being a tail estimate.
Raised threshold back to 10 across every dashboard surface that exposes a percentile, and added a p50 fallback so short calls still show a useful number (just labelled honestly).
App-level "Avg latency p95" card now also gates on ≥3 qualifying calls, otherwise it renders "—" instead of an average that's effectively dominated by whichever short call happens to be in the bucket.

Implementation

MIN_TURNS_FOR_PERCENTILES = 10 exported from LatencyPanel.tsx and MetricsPanel.tsx; MIN_TURNS_FOR_P95_COLUMN = 10 from CallTable.tsx; MIN_TURNS_FOR_AVG_P95 = 10 + MIN_CALLS_FOR_AVG_P95 = 3 from App.tsx. Single source per surface so the threshold can't drift.
Detail pane (LatencyPanel, both latency view branches of MetricsPanel): below the threshold the p95 box renders p50 (n<10) instead, with a tooltip and footer line spelling out the rule. Realtime calls (single-bucket waterfall) and pipeline calls (stt/llm/tts waterfall) both covered.
Call list (CallTable.tsx): column renamed "p95 latency" → "Latency" (since it now reports either statistic); rows with turnCount < 10 show <ms> (p50); column header has a tooltip explaining the fallback.
Sparkline tooltip (Metric.tsx bucketHeadline): when no call in the bucket has ≥10 turns, the headline reads AVG LATENCY n/a (n<10 turns) rather than a fake "0 ms".
App headline card (App.tsx): avgP95() filters out calls with <10 turns, requires ≥3 qualifying calls, returns 0 otherwise. The card then renders "—" instead of "0 ms" so the empty state is unambiguous.
New test file dashboard-app/src/App.test.ts (8 cases) covers avgP95 gating and bucketHeadline fallback.
Re-built the SPA (vite build) and re-synced the inlined bundle into libraries/{typescript,python}/.../dashboard/ui.html via dashboard-app/scripts/sync.mjs so both SDKs ship the updated UI.

Breaking change?

No. The thresholds are internal to the dashboard SPA; SDK API surface is untouched. Users with long calls (≥10 turns) see no change; users with short calls now see honest p50 numbers instead of a noisy p95.

Test plan

dashboard-app: npm test -- --run → 16/16 pass (8 new + 8 existing)
dashboard-app: npm run lint (tsc --noEmit) clean
dashboard-app: npm run build succeeds, 208 kB bundle synced
libraries/typescript: npm test -- --run → 1516/1516 pass
libraries/typescript: npm run lint clean
Manual: open dashboard on a short (n<10) call and verify the detail pane shows p50 (n<10) with a tooltip
Manual: open dashboard on a long (n≥10) call and verify p95 surfaces as before

Docs updates

N/A — internal dashboard rendering change. CHANGELOG.md entry added under ## 0.6.1 (2026-05-12).

PR-#82's 5 → 2 turn lowering reunited the per-call detail pane with the call-list column, but on a live n=5 turn call the headline became "p95=1977ms" while p50 was only 309ms — the 95th percentile collapses to the slowest single sample at low n and stops being a tail estimate. Threshold is now 10 turns everywhere (LatencyPanel, MetricsPanel, CallTable, Metric tooltip, App "Avg latency p95" card). Below the threshold every surface falls back to p50 — robust to a single outlier — and labels the cell so the user knows why. App-level "Avg latency p95" additionally requires ≥3 qualifying calls before showing a number; otherwise the card renders "—" instead of a polluted average. The four exported MIN_TURNS_* constants are kept in lockstep so the threshold is single-sourced. Bundle re-synced to both SDKs via dashboard-app/scripts/sync.mjs. New tests: src/App.test.ts (8) covers avgP95 gating + bucketHeadline fallback. dashboard-app vitest: 16/16 pass. libraries/typescript vitest: 1516/1516 pass. tsc --noEmit clean on both packages.

nicolotognoni mentioned this pull request May 12, 2026

fix(0.6.1): dashboard p95 fallback hint -> compact tooltip #98

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dashboard): raise p95 sample threshold to 10 turns with p50 fallback#96

fix(dashboard): raise p95 sample threshold to 10 turns with p50 fallback#96
nicolotognoni wants to merge 1 commit into
feat/observability-otel-attrs-0.6.1from
fix/0.6.1-dashboard-p95-sample-threshold

nicolotognoni commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicolotognoni commented May 12, 2026

Summary

Implementation

Breaking change?

Test plan

Docs updates

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant