feat: add token debug view for rollout rows in ep logs by benjibc · Pull Request #430 · eval-protocol/python-sdk

benjibc · 2026-03-07T00:54:36Z

Summary

add a generic token debug panel to expanded ep logs rows for full_episode and token_turn_traces
make the chat transcript more prompt-faithful by merging tool declarations into the first system message
hide raw assistant tool-call payload text in the message bubbles when tool_calls are present

Validation

npm run build
verified in Chromium against real FrozenLake visual rollouts from Kimi K2.5 VL and Qwen3 VL

Screenshots

Kimi K2.5 VL

Qwen3 VL

Note

Low Risk
UI-only changes to log rendering and debug visualization with no authentication, persistence, or backend behavior changes; main risk is regressions in message formatting or expanded-row layout.

Overview
Adds a new TokenDebugView panel to expanded evaluation/rollout rows (when extra.full_episode or extra.token_turn_traces are present) to visualize token IDs, masking, and logprobs across turns.

Updates the expanded row chat transcript to be more prompt-faithful by injecting tool declarations into the first system message, and tweaks MessageBubble rendering to hide raw assistant content when tool_calls exist while still showing/copying tool-call arguments.

Also removes the committed built CSS asset (vite-app/dist/assets/index-*.css), indicating build artifacts are no longer tracked/updated in this PR.

^{Written by Cursor Bugbot for commit 086a977. This will update automatically on new commits. Configure here.}

benjibc · 2026-03-07T00:56:43Z

Closing in favor of #431, which is based on current main and contains the same intended UI changes without the outdated branch history.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 086a9776ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-07T00:59:46Z

vite-app/src/components/TokenDebugView.tsx

+                key={i}
+                token={tok}
+                tokenId={allIds[i]}
+                turnIdx={isPrompt ? 0 : trace.step_index}


Offset turn indices before rendering completion tokens

TurnSection uses trace.step_index directly as turnIdx for completion tokens, but this breaks when traces are zero-based (a valid RL convention where the first turn is 0): EpisodeToken then treats first-turn completions as prompt tokens because turnIdx > 0 is false, so those tokens render as masked and lose logprob coloring/tooltip semantics. Normalize the displayed turn index to a positive completion index before passing it to token rendering.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-07T00:59:46Z

vite-app/src/components/MessageBubble.tsx

  const isTool = message.role === "tool";
  const hasToolCalls = message.tool_calls && message.tool_calls.length > 0;
  const hasFunctionCall = message.function_call;
+  const hideMessageContent = message.role === "assistant" && hasToolCalls;


Preserve assistant text when tool calls carry real content

The new hideMessageContent condition suppresses all assistant message text whenever tool_calls is present, which drops legitimate assistant narration in responses that include both natural-language content and tool calls. In those cases the transcript becomes incomplete (only tool-call cards remain), so this should be gated on payload-like/empty content rather than every assistant message with tool_calls.

Useful? React with 👍 / 👎.

feat: add token debug view to ep logs

086a977

This was referenced Mar 7, 2026

feat: unify training rendering and harden FrozenLake VL fw-ai/cookbook#176

Merged

feat: align ep logs chat view with tokenized rollout prompts #431

Merged

benjibc closed this Mar 7, 2026

chatgpt-codex-connector bot reviewed Mar 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add token debug view for rollout rows in ep logs#430

feat: add token debug view for rollout rows in ep logs#430
benjibc wants to merge 1 commit intomainfrom
benjibc/ep-logs-token-debug-frozenlake-main

benjibc commented Mar 7, 2026 •

edited by cursor bot

Loading

Uh oh!

benjibc commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benjibc commented Mar 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Screenshots

Kimi K2.5 VL

Qwen3 VL

Uh oh!

benjibc commented Mar 7, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benjibc commented Mar 7, 2026 •

edited by cursor bot

Loading