feat(providers): explicit prompt cache for DashScope/Qwen#1127
feat(providers): explicit prompt cache for DashScope/Qwen#1127nguyennguyenit wants to merge 2 commits intodevfrom
Conversation
Extend OpenAI-compat path with Anthropic-style cache_control:ephemeral inline blocks for Alibaba DashScope endpoints (provider_type=bailian or URL contains "dashscope"). Verified live: qwen3.6-plus / qwen3.5-plus / qwen3-coder-plus all return 99.7% cache hit rate on 6K-token prefix, yielding ~89% token cost reduction on cached prefix (90% Alibaba discount). - isDashScope(): 3-source detection (URL + providerType + name) handles reverse-proxied endpoints; covers both "dashscope" and "bailian" - buildRequestBody wraps system content with SplitSystemPromptForCache (exported from anthropic_request.go) using <!-- GOCLAW_CACHE_BOUNDARY --> - Tool prefix cache: cache_control on last tool definition, with 4-marker budget guard - Parse cache_creation_input_tokens from prompt_tokens_details into Usage.CacheCreationTokens; propagates via existing span metadata writers - Runtime escape hatch: GOCLAW_DISABLE_DASHSCOPE_CACHE=true - Live integration smoke test (build tag integration, env-gated)
🔍 Code Review — feat(providers): explicit prompt cache for DashScope/Qwen🎯 Tổng quanExtend OpenAI-compat provider với Anthropic-style Scope: 13 files, +367/-13, 1 commit 🤖 CI Status + Merge ConflictsCI: ✅ Điểm Tốt
🟡 MEDIUM
🟢 LOW
📊 Summary
💡 Recommendation🟢 APPROVED Solid feature, live verified, ~75% token cost reduction cho Qwen models. Clean backward compatibility, good test coverage, runtime escape hatch. Wait for CI go:complete before merging! Great work @nguyennguyenit — đây là PR tiết kiệm tiền thật sự! 💰🚀 |
- Use strings.Repeat instead of custom repeat() helper in smoke test - Clarify BuildRequestBodyForTest is test-only, not public API
Summary
Extend OpenAI-compat provider path with Anthropic-style
cache_control:ephemeralinline blocks for Alibaba DashScope endpoints. Live verified: 99.7% cache hit rate on 6K-token prefix forqwen3.6-plus(model production agentstraandvan-anhuse), yielding ~89% input-token cost reduction on cached prefix (90% Alibaba discount × 99.7% hit).Changes
isDashScope()3-source detection (URL + providerType + name) onOpenAIProvider, mirrors existingdashScopePassthroughKeys()pattern; covers bothdashscopeandbailianprovider_typebuildRequestBodywraps system content usingSplitSystemPromptForCache(exported fromanthropic_request.go) with shared<!-- GOCLAW_CACHE_BOUNDARY -->marker — single source of truth across Anthropic + DashScope pathscache_controlinjected on last tool definition with 4-marker budget guardcache_creation_input_tokensfromprompt_tokens_detailsintoUsage.CacheCreationTokens; propagates via existing span metadata writers (loop_run.go:228,loop_tracing.go,tools/subagent_tracing.go) — no pipeline changes neededGOCLAW_DISABLE_DASHSCOPE_CACHE=trueintegration, env-gated byDASHSCOPE_API_KEY)Live Validation
qwen3-coder-plusqwen3.5-plusqwen3.6-plustra,van-anhqwen3-max-2026-01-23qwen3-coder-nextWire format silently accepted by all models (no 400). For unsupported variants the wrap is a no-op (cost: 0 tokens, microseconds CPU). Observability via span
cached_tokensgives self-correcting signal — no per-model capability flag needed.Backward Compatibility
prompt_cache_keymiddleware)isDashScope()guardTest plan
go build ./...(PG) +go build -tags sqliteonly ./...(SQLite) cleango vet ./...cleanqwen3.6-plus99.7% hit verifiedavg_cached_tokens > 0for qwen modelsGOCLAW_DISABLE_DASHSCOPE_CACHE=truereverts to string contentEstimated Impact
Based on 5 baseline traces (research §"Target Saving"), aggregate input drops from ~6,023K → ~1,500K tokens (-75%) for mid-conversation Qwen calls. Combined with companion plan
pancake-pos-mcp/plans/260508-compact-response/→ -85% total.