Skip to content

feat(observe): local dashboard — view real answers + run eval packs#2201

Open
Mikecranesync wants to merge 1 commit into
feat/observability-eval-layerfrom
feat/observe-dashboard
Open

feat(observe): local dashboard — view real answers + run eval packs#2201
Mikecranesync wants to merge 1 commit into
feat/observability-eval-layerfrom
feat/observe-dashboard

Conversation

@Mikecranesync

Copy link
Copy Markdown
Owner

Why

The original request that started this whole stream was "make the observe layer a dashboard (viewing + control)." This delivers it — and now there's real data to show (the 3,725 answers exported in #2172 + the eval packs).

What

simlab/observe/dashboard.py — single file, stdlib http.server, loopback-only, no auth, no engine needed.

  • Evals tab: button per pack → run_eval.run() → live scorecard; past reports list, click to reopen.
  • History tab: reads the newest traces-*.csv from the Langfuse export — browse the 3,725 real production answers, grounded% bar, filter grounded/ungrounded, search.
  • Reads reports/*.json + the export CSV; writes nothing beyond what run_eval already does. Binds 127.0.0.1 only.

Verification (live)

  • all endpoints 200; /api/history → 3725 answers / 2734 grounded (73%); POST /api/run conveyor_demo → 7 pass / 2 fail; headless screenshot renders; ruff clean.
  • Run: python -m simlab.observe.dashboardhttp://127.0.0.1:8770

Notes

🤖 Generated with Claude Code

…viewing & control)

The original ask that kicked off this work stream: one local screen to see what
MIRA does and run the test packs. Single file, stdlib http.server, loopback-only,
no auth, no engine needed.

- Evals tab: a button per eval pack → POST /api/run → run_eval.run() → live
  scorecard (status/asset/retr/cite/pts/warnings); lists past reports, click to
  reopen.
- History tab: reads the newest traces-*.csv from the Langfuse export
  (tools/langfuse_export.py output) — browse the 3,725 real production answers,
  grounded% bar, filter grounded/ungrounded, search questions.
- Reads simlab/observe/reports/*.json + the export CSV; writes nothing beyond
  what run_eval already writes. Binds 127.0.0.1 only.

Verified live: all endpoints 200; /api/history reports 3725 answers / 2734
grounded; POST /api/run conveyor_demo → 7 pass / 2 fail; headless screenshot
renders. ruff clean.

Run: python -m simlab.observe.dashboard  (→ http://127.0.0.1:8770)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01CS9fxC3gdSUJDJqHw1uMiu
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant