Skip to content

fix: multi-protocol visual/websearch orchestration, tool_result role, and stability fixes#82

Merged
vfyjxf merged 9 commits into
mainfrom
bugfix-2026-06-12
Jun 12, 2026
Merged

fix: multi-protocol visual/websearch orchestration, tool_result role, and stability fixes#82
vfyjxf merged 9 commits into
mainfrom
bugfix-2026-06-12

Conversation

@ZhiYi-R

@ZhiYi-R ZhiYi-R commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

修复 Visual 和 WebSearch 插件在多协议(Chat / Google GenAI)路径下的若干编排缺陷,包括跨协议 tool_result role 错误、mixed tool 泄露、streaming 路径阻断等。同时修复 ProviderManager、Anthropic adapter 和 stream 层的一组稳定性问题。

Key changes

Visual plugin

  • tool_result role 从 "user" 修正为 "tool" — Core 格式的 tool_result 消息必须用 "tool" role。"user" 导致 Chat adapter 无法生成 ToolCallID,DeepSeek Chat API 拒绝请求(insufficient tool messages following tool_calls message)。Anthropic adapter 的 mapRole 已做 "tool""user" 转换,不受影响。(core_orchestrator.go
  • Chat streaming 路径接入 Visual orchestrator — 之前 handleAdapterStream 未走 Visual 包装,Chat 协议模型无法触发 visual_brief/visual_qa。(adapter_dispatch.go
  • Google GenAI 路径接入 Visual orchestrator — 同上,为 Google 协议补充 googleProviderClient 适配。(adapter_dispatch.go
  • mixed visual/non-visual tool 泄露修复 — 当 assistant 同时返回 visual tool 和非 visual tool 时,非 visual tool 被正确保留在消息中传递给 Bridge 而非丢失。(orchestrator.go, core_orchestrator.go
  • base64 格式 fallthrough、system prompt 注入、请求 clone 隔离、assistant 消息选择等多处细节修复。

WebSearch plugin

  • mixed search/non-search tool 泄露修复 — 同上模式,非 search tool 不再被 websearch orchestrator 吞掉。(orchestrator.go
  • escapeForJSON 加固、Tavily/Firecrawl 重试逻辑、Google multi-candidate 处理、Message.Role defcheck。

ProviderManager & Adapter fixes

  • RWMutex 读写锁、stream buffer per-request 隔离、cache key 转发、goroutine leak、ctx cancellation 传递、HTTP client 配置。(manager.go, adapter_dispatch.go
  • Anthropic: tool_result content 默认值、首消息 role guard、Gemini ToolUseID 匹配、toolUseIDMap race。(anthropic/adapter.go
  • Chat: 空 content 过滤避免上游拒绝、assistant tool_use → user tool_result 消息序列完整性。(chat/adapter.go

Misc

  • namespace tool flattening — NestedOneOf/NestedAnyOf/Flat 策略支持。(codextool/namespace_schema.go
  • 文档 ASCII 图表 → Mermaid、docs 同步。

Verification

  • internal/extension/* 全包测试通过(12 包)
  • internal/protocol/{chat,anthropic,google} adapter 测试通过
  • Visual + Chat 协议端到端场景已通过 trace 回放验证

Files changed

81 files, +2258 −801

ZhiYi-R added 9 commits June 12, 2026 00:25
- 默认关闭 codex_tool_proxy 扩展(DefaultEnabled: true → false)
- README.md:移除不存在的 /health 端点
- docs/architecture.md:修正 ClientAdapter 接口签名([]byte → any),移除 /health 和 bridge 目录
- docs/api.md:管理 API 端点表与实际代码同步(router.go 26 个端点)
- docs/extensions-overview.md:新增 codex_tool_proxy 扩展文档
- docs/development-conventions.md:补充 codex_tool_proxy/codextool 目录
- config.example.yml:添加 codex_tool_proxy 注释说明
- docs/architecture.md: 四层架构图 → flowchart TB + 4 个子图
- docs/architecture.md: 请求生命周期数据流 → flowchart TD
- docs/development-conventions.md: 目录树 → flowchart TD + 嵌套子图
…arding, goroutine leak, ctx cancellation, HTTP client config
… Gemini ToolUseID matching, toolUseIDMap race, empty content filtering
…ategies

Adds three core components for flexible namespace tool handling:

- NamespaceSchemaBuilder (namespace_schema.go): BuildNamespaceTools
  converts namespace tools via NestedOneOf (oneOf Schema with action as
  single-value enum), NestedAnyOf (action enum + anyOf params), or Flat
  (current behavior).

- NestedCallDecoder: DecodeNestedCall extracts action and params from
  both oneOf and anyOf response formats. TryExtractAction provides
  lightweight partial-JSON scanning for streaming extraction.

- StreamBufferStrategy: two-level streaming buffer for nested tool calls
  — defers output_item.added until action is extracted from JSON stream,
  then replays buffered params deltas.

Configuration via OpenAIAdapter.nsStrategy field (variadic constructor).
Deduplication in flattenToolsWithNamespace prefers codex_namespace-
annotated tools over flat functions with same name.
…dening, Tavily/Firecrawl retry, formatting dedup, Google multi-candidate, Message.Role defcheck
…system prompt, style dedup, clone isolation, assistant message selection, string Content handling
…e paths

Chat streaming: inject visCoreProvider before StreamChat. When visual is
enabled, calls CoreOrchestrator.CreateCore non-streaming, then produces
synthetic SSE events via coreResponseToCoreStream.

Google non-streaming: inject wrapWithVisual with googleProviderClient
adapter (wraps *google.Client into provider.ProviderClient). Visual
orchestrator runs before wsInjected/googleClient.GenerateContent.

Google streaming: inject visCoreProvider with googleProviderClient,
same synthetic-stream pattern as Chat streaming.

wrapWithVisual extended with case ProtocolGoogleGenAI for visual
provider side (uses activeGoogleClient + FirstUpstreamModelForKey).

googleProviderClient adapts *google.Client to provider.ProviderClient
by capturing model name at construction time.
Core format tool_result messages must use role tool — Anthropic adapter
maps tool→user, Chat adapter relies on role tool to emit ToolCallID.
@vfyjxf vfyjxf merged commit db34447 into main Jun 12, 2026
16 checks passed
@ZhiYi-R ZhiYi-R deleted the bugfix-2026-06-12 branch June 12, 2026 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants