Skip to content

feat: add token debug view for rollout rows in ep logs#430

Closed
benjibc wants to merge 1 commit intomainfrom
benjibc/ep-logs-token-debug-frozenlake-main
Closed

feat: add token debug view for rollout rows in ep logs#430
benjibc wants to merge 1 commit intomainfrom
benjibc/ep-logs-token-debug-frozenlake-main

Conversation

@benjibc
Copy link
Contributor

@benjibc benjibc commented Mar 7, 2026

Summary

  • add a generic token debug panel to expanded ep logs rows for full_episode and token_turn_traces
  • make the chat transcript more prompt-faithful by merging tool declarations into the first system message
  • hide raw assistant tool-call payload text in the message bubbles when tool_calls are present

Validation

  • npm run build
  • verified in Chromium against real FrozenLake visual rollouts from Kimi K2.5 VL and Qwen3 VL

Screenshots

Kimi K2.5 VL

Kimi ep logs token debug

Qwen3 VL

Qwen3 VL ep logs token debug


Note

Low Risk
UI-only changes to log rendering and debug visualization with no authentication, persistence, or backend behavior changes; main risk is regressions in message formatting or expanded-row layout.

Overview
Adds a new TokenDebugView panel to expanded evaluation/rollout rows (when extra.full_episode or extra.token_turn_traces are present) to visualize token IDs, masking, and logprobs across turns.

Updates the expanded row chat transcript to be more prompt-faithful by injecting tool declarations into the first system message, and tweaks MessageBubble rendering to hide raw assistant content when tool_calls exist while still showing/copying tool-call arguments.

Also removes the committed built CSS asset (vite-app/dist/assets/index-*.css), indicating build artifacts are no longer tracked/updated in this PR.

Written by Cursor Bugbot for commit 086a977. This will update automatically on new commits. Configure here.

@benjibc
Copy link
Contributor Author

benjibc commented Mar 7, 2026

Closing in favor of #431, which is based on current main and contains the same intended UI changes without the outdated branch history.

@benjibc benjibc closed this Mar 7, 2026
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 086a9776ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

key={i}
token={tok}
tokenId={allIds[i]}
turnIdx={isPrompt ? 0 : trace.step_index}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Offset turn indices before rendering completion tokens

TurnSection uses trace.step_index directly as turnIdx for completion tokens, but this breaks when traces are zero-based (a valid RL convention where the first turn is 0): EpisodeToken then treats first-turn completions as prompt tokens because turnIdx > 0 is false, so those tokens render as masked and lose logprob coloring/tooltip semantics. Normalize the displayed turn index to a positive completion index before passing it to token rendering.

Useful? React with 👍 / 👎.

const isTool = message.role === "tool";
const hasToolCalls = message.tool_calls && message.tool_calls.length > 0;
const hasFunctionCall = message.function_call;
const hideMessageContent = message.role === "assistant" && hasToolCalls;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve assistant text when tool calls carry real content

The new hideMessageContent condition suppresses all assistant message text whenever tool_calls is present, which drops legitimate assistant narration in responses that include both natural-language content and tool calls. In those cases the transcript becomes incomplete (only tool-call cards remain), so this should be gated on payload-like/empty content rather than every assistant message with tool_calls.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant