feat: add generic /v1/completions client and token debug visualization by benjibc · Pull Request #428 · eval-protocol/python-sdk

benjibc · 2026-03-05T05:38:09Z

Summary

Add FireworksV1CompletionsClient — a generic, domain-agnostic client for the /v1/completions endpoint with local tokenization via HuggingFace transformers. Tool-call parsing is pluggable via a tool_call_parser callback.
Add TokenDebugView React component for the evaluation dashboard with three view modes:
- Text: readable colored text with mask overlay (prompt=gray, completion=colored by turn)
- Episode: per-token visualization with mask or logprob gradient coloring
- Turns: per-turn breakdown of prompt/completion tokens
Token ID chips with hover tooltips showing detokenized text and logprob
Smooth HSL gradient for logprob coloring (green → red)
Exports ParsedToolCall and to_openai_tool_calls as generic types

Test plan

test_fireworks_v1_completions_client.py unit tests
End-to-end verified with FrozenLake rollout example (cookbook)
UI verified in Vite dev mode — text/episode/turns views, IDs toggle, mask/logprob color modes

Made with Cursor

Note

Medium Risk
Introduces new model-calling and retry/tokenization logic plus a sizable new UI debug surface; risk is mainly around API/response shape assumptions and tokenizer/template edge cases rather than core auth/data integrity.

Overview
Adds FireworksV1CompletionsClient, a generic Fireworks /v1/completions wrapper that locally tokenizes chat messages (with fallbacks when chat templates/tools fail), retries transient API errors, extracts token IDs and logprobs, and optionally injects OpenAI-style tool_calls via a pluggable tool_call_parser (exporting ParsedToolCall/to_openai_tool_calls).

Updates the Vite evaluation dashboard to show a new TokenDebugView when execution_metadata.extra includes token_turn_traces or full_episode, providing per-token visualization (mask-by-turn or logprob coloring) and per-turn breakdowns; adds unit tests covering token normalization, template retry behavior, thinking flag passthrough, and parser integration.

^{Written by Cursor Bugbot for commit c23e31d. This will update automatically on new commits. Configure here.}

Add a generic FireworksV1CompletionsClient that handles local tokenization via HuggingFace transformers and calls the /v1/completions endpoint with token-in/token-out. Tool-call parsing is pluggable via a callback rather than hardcoded to any specific domain. Also add TokenDebugView component for the evaluation dashboard with: - Text view: readable colored text with mask overlay (prompt vs completion) - Episode view: per-token visualization with mask or logprob coloring - Turns view: per-turn breakdown of prompt/completion tokens - Token ID chips with hover tooltips showing detokenized text and logprob - Smooth gradient logprob coloring (green=high confidence, red=low) Made-with: Cursor

vite-app/src/components/EvaluationRow.tsx

vite-app/src/components/TokenDebugView.tsx

tests/test_fireworks_v1_completions_client.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cd7e0937e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

eval_protocol/integrations/fireworks_v1_completions_client.py

vite-app/src/components/EvaluationRow.tsx

tests/test_fireworks_v1_completions_client.py

- Rewrite tests to match the generic client API (remove references to old domain-specific methods like _parse_tool_call_with_optional_fallback) - Fix TokenDebugSection guard to also check extra?.full_episode - Fix zero reward styled as red/negative — now uses neutral gray - Fix tools=[] vs None: explicit empty list no longer falls back to default_tools Made-with: Cursor

eval_protocol/integrations/fireworks_v1_completions_client.py

cursor · 2026-03-05T05:55:06Z

vite-app/src/components/TokenDebugView.tsx

+              showIds={showIds}
+            />
+          ))
+        )}


View selector falls through silently without fullEpisode

Medium Severity

The ternary chain for view selection uses viewLevel === "text" && fullEpisode and viewLevel === "episode" && fullEpisode, so when fullEpisode is null but the user has selected "text" or "episode" mode, the condition silently falls through to rendering tokenTurnTraces (the "turns" view). The UI shows the "text" or "episode" button as active/selected while displaying turns content, which is misleading.

…issing fullEpisode - build_assistant_turn_token_ids now passes _thinking_kwargs() to apply_chat_template, consistent with _build_prompt_token_ids and build_tool_response_suffix_token_ids - View selector no longer silently renders nothing when fullEpisode is null and user selected text/episode view — falls back to turns view or shows a placeholder message Made-with: Cursor

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

eval_protocol/integrations/fireworks_v1_completions_client.py

…ams overriding core fields - Add second None check in _normalize_token_id_sequence after extracting from a Mapping, since .get() can return None even when the key exists - Move request_params spread before explicit keys (model, prompt, temperature, max_tokens, logprobs) so they cannot be silently overridden by stray entries in request_params Made-with: Cursor

benjibc mentioned this pull request Mar 5, 2026

feat: add FrozenLake multi-turn tool-call GRPO training example fw-ai/cookbook#168

Merged

3 tasks

cursor bot reviewed Mar 5, 2026

View reviewed changes

vite-app/src/components/EvaluationRow.tsx Outdated Show resolved Hide resolved

vite-app/src/components/TokenDebugView.tsx Show resolved Hide resolved

tests/test_fireworks_v1_completions_client.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Mar 5, 2026

View reviewed changes

eval_protocol/integrations/fireworks_v1_completions_client.py Outdated Show resolved Hide resolved

vite-app/src/components/EvaluationRow.tsx Outdated Show resolved Hide resolved

tests/test_fireworks_v1_completions_client.py Outdated Show resolved Hide resolved

cursor bot reviewed Mar 5, 2026

View reviewed changes

eval_protocol/integrations/fireworks_v1_completions_client.py Show resolved Hide resolved

eval_protocol/integrations/fireworks_v1_completions_client.py Show resolved Hide resolved

benjibc merged commit e80c7ae into main Mar 5, 2026
16 of 17 checks passed

benjibc deleted the bchen/generic-v1-completions-client-and-token-debug-view branch March 5, 2026 19:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add generic /v1/completions client and token debug visualization#428

feat: add generic /v1/completions client and token debug visualization#428
benjibc merged 4 commits intomainfrom
bchen/generic-v1-completions-client-and-token-debug-view

benjibc commented Mar 5, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 5, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benjibc commented Mar 5, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot Mar 5, 2026

Choose a reason for hiding this comment

View selector falls through silently without fullEpisode

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benjibc commented Mar 5, 2026 •

edited by cursor bot

Loading

View selector falls through silently without `fullEpisode`