feat(observe): local dashboard — view real answers + run eval packs by Mikecranesync · Pull Request #2201 · Mikecranesync/MIRA

Mikecranesync · 2026-06-21T21:43:26Z

Why

The original request that started this whole stream was "make the observe layer a dashboard (viewing + control)." This delivers it — and now there's real data to show (the 3,725 answers exported in #2172 + the eval packs).

What

simlab/observe/dashboard.py — single file, stdlib http.server, loopback-only, no auth, no engine needed.

Evals tab: button per pack → run_eval.run() → live scorecard; past reports list, click to reopen.
History tab: reads the newest traces-*.csv from the Langfuse export — browse the 3,725 real production answers, grounded% bar, filter grounded/ungrounded, search.
Reads reports/*.json + the export CSV; writes nothing beyond what run_eval already does. Binds 127.0.0.1 only.

Verification (live)

all endpoints 200; /api/history → 3725 answers / 2734 grounded (73%); POST /api/run conveyor_demo → 7 pass / 2 fail; headless screenshot renders; ruff clean.
Run: python -m simlab.observe.dashboard → http://127.0.0.1:8770

Notes

Stacked on feat(observe): production-grade observability + evaluation layer (phases 0-3) #2154 (needs simlab/observe + run_eval); base feat/observability-eval-layer. Retarget to main after feat(observe): production-grade observability + evaluation layer (phases 0-3) #2154 merges.
History dir resolves MIRA_EXPORT_DIR → ~/langfuse-export → tools/langfuse-export.

🤖 Generated with Claude Code

…viewing & control) The original ask that kicked off this work stream: one local screen to see what MIRA does and run the test packs. Single file, stdlib http.server, loopback-only, no auth, no engine needed. - Evals tab: a button per eval pack → POST /api/run → run_eval.run() → live scorecard (status/asset/retr/cite/pts/warnings); lists past reports, click to reopen. - History tab: reads the newest traces-*.csv from the Langfuse export (tools/langfuse_export.py output) — browse the 3,725 real production answers, grounded% bar, filter grounded/ungrounded, search questions. - Reads simlab/observe/reports/*.json + the export CSV; writes nothing beyond what run_eval already writes. Binds 127.0.0.1 only. Verified live: all endpoints 200; /api/history reports 3725 answers / 2734 grounded; POST /api/run conveyor_demo → 7 pass / 2 fail; headless screenshot renders. ruff clean. Run: python -m simlab.observe.dashboard (→ http://127.0.0.1:8770) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01CS9fxC3gdSUJDJqHw1uMiu

Mikecranesync mentioned this pull request Jun 21, 2026

Run garage_conveyor_field eval --live on Charlie (staging Neon) + report grounding numbers #2202

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(observe): local dashboard — view real answers + run eval packs#2201

feat(observe): local dashboard — view real answers + run eval packs#2201
Mikecranesync wants to merge 1 commit into
feat/observability-eval-layerfrom
feat/observe-dashboard

Mikecranesync commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mikecranesync commented Jun 21, 2026

Why

What

Verification (live)

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant