feat(chat): 运行卡片/思考步骤展示模型交互轮次与缓存命中量#115
Merged
Merged
Conversation
CLI(agentic)模式下 token 数为多轮累加(且含每轮 cache_read 重复计入), 远超模型单请求窗口,易被误读为超限或计量出错。改为在采集层补充真实轮次与 cache_read,运行卡片以「↑总量 ⛁缓存命中 ↓输出 · N 轮」呈现实际与模型交互的规模: - shim:claude 解析顶层 num_turns、codex 按 turn.completed 计轮次,cache_read 单列上抛;litellm/API 路径同步补 cache_read 采集(Anthropic cache_read / OpenAI cached_tokens),两条路径一致 - TokenUsage 增 cacheReadTokens / turns;turns 缺失回退为 LLM 调用次数 - UI 单轮(≤1)或无缓存命中时自动隐藏对应信息 - 新增浅蓝数据库柱体图标 DatabaseIcon 表示缓存命中,与输入统计留间隔 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
在「展示模型交互轮次与缓存命中量」基础上打磨展示: - 轮次「N 轮」文案改为循环箭头图标 + 次数(RepeatIcon),省空间、免复数 i18n - 输入 / 输出拆为各自独立的 hover 提示(tokensInTitle / tokensOutTitle), 取代原合并提示;缓存命中悬浮另给「缓存 N」说明 - 缓存图标数字与周边同色、柱体图标用浅蓝($color-token-cache);间距移到 cache 前,无命中时不留多余间距 - 抽出共用 TokenStat 组件,run 卡片(RunMeta) 与思考步骤(AgentStep) 复用, 思考步骤也随之显示缓存命中量 - 补 CHANGELOG 说明 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
背景
CLI(claude / codex)接管 LLM 时,运行卡片上的 token 用量是 agentic 多轮累加值,且每轮的
cache_read反复计入,累计值远超模型单请求窗口(如/ask ↑4.69M),容易被误读为「超出模型上限」或「计量实现有问题」。实际并非 bug——是多轮 + 缓存重复读的自然结果。本 PR 让 UI 把「实际与模型交互的规模」呈现清楚。改动
采集层(嵌入式 pr-agent shim)
num_turns、codex 按turn.completed计轮次,统一并入 usage 的num_turnscache_read既计入 ↑输入总量、又单列上抛,供 UI 拆分展示cache_read采集(Anthropiccache_read_input_tokens/ OpenAIprompt_tokens_details.cached_tokens),与 CLI 路径一致契约 / 累加
TokenUsage新增cacheReadTokens/turns(可选,向后兼容历史 run)展示(run 卡片 + 思考步骤)
TokenStat组件:↑输入 (⛁缓存命中) / ↓输出,run 卡片与思考步骤复用DatabaseIcon)标缓存命中量——属输入的一部分,无命中则整段不显示RepeatIcon)标模型交互轮次——单轮不显示验证
nx typecheck desktop、nx lint desktop、nx typecheck shared均通过;4 个 locale 校验为合法 JSON 且保持递归字典序;Python shim 字节编译 + 解析器功能自测通过。🤖 Generated with Claude Code