verifier: side-by-side renderer↔gateway prompt comparison#409
Open
Hecate0821 wants to merge 1 commit intomainfrom
Open
verifier: side-by-side renderer↔gateway prompt comparison#409Hecate0821 wants to merge 1 commit intomainfrom
Hecate0821 wants to merge 1 commit intomainfrom
Conversation
When the renderer's prompt diverges from the live gateway's
prompt_token_ids — the load-bearing condition for SFT correctness on
Fireworks — the GUI now shows the two prompts in adjacent columns,
with the divergent suffix highlighted in rose. This makes the
training-inference distribution shift visible without leaving the
verifier.
probe.py: write the gateway's prompt tokens (+ decoded form) into
artifact.api.prompt. Used to be computed and discarded; now persisted
under api.prompt.{tokens,decoded} so the viewer (and offline tooling)
can show it.
viewer/index.html:
- New PromptComparison component, slotted at the top of each case
view above the token stream.
- Only renders when sanity.renderer_prompt_matches_api_prompt is
false AND the artifact has the new api.prompt block (older
artifacts skip cleanly).
- Computes the first divergent position by linear scan; tokens at or
beyond that position are shown with a fail-color background.
- Two-column layout collapses to one column under 900px viewport.
- Amber container border so the panel reads as "warning, this is
the train-vs-inference shift you should care about" without
shouting.
Try it: pick `kimi_k25_disable_thinking` + tokenizer
`moonshotai/Kimi-K2.5` + model `accounts/fireworks/models/kimi-k2p5`,
type any prompt, hit Run probe. The new panel surfaces the gateway's
"open <think>" vs the renderer's "<think></think>" structural
divergence directly.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
When the renderer's prompt diverges from the live Fireworks gateway's tokenisation — the load-bearing condition for SFT correctness on Fireworks — the GUI now shows the two prompts in adjacent columns with the divergent suffix highlighted. This makes the training-inference distribution shift visible inside the verifier instead of buried in JSON.
probe.py— persists the gateway's prompt tokens (+ decoded form) underartifact.api.prompt.{tokens,decoded}. Previously computed and discarded.viewer/index.html— newPromptComparisoncomponent, slotted above the token stream in each case. Only renders whensanity.renderer_prompt_matches_api_promptis false AND the artifact has the new block. Older artifacts skip cleanly.End-to-end demo (kimi_k25_disable_thinking)
export FIREWORKS_API_KEY=fw_... ./run.shIn the GUI:
kimi_k25_disable_thinking(tokenizer auto-fills tomoonshotai/Kimi-K2.5)accounts/fireworks/models/kimi-k2p5Run probeThe amber
Prompt comparisonpanel appears above the token stream, showing renderer's<think></think>\n\nblock (matching HF) vs gateway's open<think>\n(the gateway doesn't honourenable_thinking=falsefor this serverless model). Divergent positions are red.Test plan
pytest -q training/tests/unit/test_verifier_probe.py(3 passed; artifact now hasapi.prompt)./run.sh, confirm the comparison panel appears with the structural divergence highlighted🤖 Generated with Claude Code