feat(providers): explicit prompt cache for DashScope/Qwen by nguyennguyenit · Pull Request #1127 · nextlevelbuilder/goclaw

nguyennguyenit · 2026-05-08T07:02:46Z

Summary

Extend OpenAI-compat provider path with Anthropic-style cache_control:ephemeral inline blocks for Alibaba DashScope endpoints. Live verified: 99.7% cache hit rate on 6K-token prefix for qwen3.6-plus (model production agents tra and van-anh use), yielding ~89% input-token cost reduction on cached prefix (90% Alibaba discount × 99.7% hit).

Changes

isDashScope() 3-source detection (URL + providerType + name) on OpenAIProvider, mirrors existing dashScopePassthroughKeys() pattern; covers both dashscope and bailian provider_type
buildRequestBody wraps system content using SplitSystemPromptForCache (exported from anthropic_request.go) with shared  marker — single source of truth across Anthropic + DashScope paths
Tool prefix cache: cache_control injected on last tool definition with 4-marker budget guard
Parse cache_creation_input_tokens from prompt_tokens_details into Usage.CacheCreationTokens; propagates via existing span metadata writers (loop_run.go:228, loop_tracing.go, tools/subagent_tracing.go) — no pipeline changes needed
Runtime escape hatch: GOCLAW_DISABLE_DASHSCOPE_CACHE=true
Live integration smoke test (build tag integration, env-gated by DASHSCOPE_API_KEY)

Live Validation

Model	Cache supported	Hit rate	Goclaw consumer
`qwen3-coder-plus`	✅	98.7%	-
`qwen3.5-plus`	✅	99.7%	-
`qwen3.6-plus`	✅	99.7%	agents `tra`, `van-anh`
`qwen3-max-2026-01-23`	❌ silent no-op (server cache not enabled)	0%	-
`qwen3-coder-next`	❌ silent no-op	0%	-

Wire format silently accepted by all models (no 400). For unsupported variants the wrap is a no-op (cost: 0 tokens, microseconds CPU). Observability via span cached_tokens gives self-correcting signal — no per-model capability flag needed.

Backward Compatibility

Anthropic provider: unchanged (already had own cache path)
Native OpenAI: unchanged (prompt_cache_key middleware)
Other OpenAI-compat (DeepSeek, Together, OpenRouter, Gemini): wrap skipped via isDashScope() guard
DB schema: no migration

Test plan

Unit: 25 new tests in providers package (detection, wrap, tool cache, usage parse) — all pass
Build: go build ./... (PG) + go build -tags sqliteonly ./... (SQLite) clean
Vet: go vet ./... clean
Integration smoke: live qwen3.6-plus 99.7% hit verified
CI: pending green check
Post-deploy 24h: query span metadata avg_cached_tokens > 0 for qwen models
Rollback rehearsal: confirm GOCLAW_DISABLE_DASHSCOPE_CACHE=true reverts to string content

Estimated Impact

Based on 5 baseline traces (research §"Target Saving"), aggregate input drops from ~6,023K → ~1,500K tokens (-75%) for mid-conversation Qwen calls. Combined with companion plan pancake-pos-mcp/plans/260508-compact-response/ → -85% total.

Extend OpenAI-compat path with Anthropic-style cache_control:ephemeral inline blocks for Alibaba DashScope endpoints (provider_type=bailian or URL contains "dashscope"). Verified live: qwen3.6-plus / qwen3.5-plus / qwen3-coder-plus all return 99.7% cache hit rate on 6K-token prefix, yielding ~89% token cost reduction on cached prefix (90% Alibaba discount). - isDashScope(): 3-source detection (URL + providerType + name) handles reverse-proxied endpoints; covers both "dashscope" and "bailian" - buildRequestBody wraps system content with SplitSystemPromptForCache (exported from anthropic_request.go) using  - Tool prefix cache: cache_control on last tool definition, with 4-marker budget guard - Parse cache_creation_input_tokens from prompt_tokens_details into Usage.CacheCreationTokens; propagates via existing span metadata writers - Runtime escape hatch: GOCLAW_DISABLE_DASHSCOPE_CACHE=true - Live integration smoke test (build tag integration, env-gated)

clark-cant · 2026-05-08T07:05:01Z

🔍 Code Review — feat(providers): explicit prompt cache for DashScope/Qwen

🎯 Tổng quan

Extend OpenAI-compat provider với Anthropic-style cache_control:ephemeral inline blocks cho DashScope/Bailian endpoints. Live verified: 99.7% cache hit rate trên 6K-token prefix cho qwen3.6-plus, giảm ~89% input token cost (90% Alibaba discount × 99.7% hit).

Scope: 13 files, +367/-13, 1 commit

🤖 CI Status + Merge Conflicts

CI: ⚠️ go: PENDING, web: ✅ PASSED (52s)
Merge conflicts: ⚠️ UNSTABLE (mergeable: MERGEABLE) — cần rebase

✅ Điểm Tốt

Live validation thực tế. 99.7% cache hit rate verified trên production agents (tra, van-anh) — không phải đoán mò. Bảng validation với 5 models (3 support, 2 silent no-op) rất thorough.
3-source detection robust. isDashScope() check URL + providerType + name — handles reverse-proxied endpoints, covers cả dashscope và bailian provider_type. Smart defensive design.
Shared cache boundary marker. SplitSystemPromptForCache exported từ anthropic_request.go, dùng chung  — single source of truth cho cả Anthropic + DashScope paths. DRY tốt.
Tool prefix cache. cache_control trên last tool definition với 4-marker budget guard — caches entire tools array (~5-10K tokens). Combined với system block cache: 2/4 markers used, efficient.
Usage parsing đầy đủ. CacheCreationInputTokens parse từ prompt_tokens_details, propagate qua existing span metadata writers — no pipeline changes needed. Observability tự động.
Runtime escape hatch. GOCLAW_DISABLE_DASHSCOPE_CACHE=true — rollback không cần redeploy. Good production hygiene.
Backward compatibility clean. Anthropic unchanged, native OpenAI unchanged, other OpenAI-compat providers skipped via isDashScope() guard. DB schema không đổi.
Integration smoke test. TestQwenCacheSmoke — per-run salt, call 1 create + call 2 hit verification, summary table output. Excellent test design.
Estimated impact documented. ~75% input token reduction for mid-conversation Qwen calls. Combined với companion plan → -85% total. Clear ROI.

🟡 MEDIUM

openAIPromptDetails.CachedTokens renamed từ CachedTokens → CachedTokens (same name, was missing Cached prefix?). Wait — looking at diff: CachedTokens int → CachedTokens int + CacheCreationInputTokens int. The old field was CachedTokens which mapped to cached_tokens JSON. But the struct field was named CachedTokens which is correct. No issue here — just adding the new field.
BuildRequestBodyForTest exported helper. Test-only export trên OpenAIProvider — acceptable pattern, nhưng comment nên note rõ đây không phải public API.

🟢 LOW

repeat() helper trong integration test. Có thể dùng strings.Repeat thay vì custom repeat() function — minor cleanup.
Non-DashScope models silent no-op. qwen3-max-2026-01-23 và qwen3-coder-next không support cache → wrap is no-op (0 tokens cost). Good design, nhưng nên document trong PR body rõ hơn về fallback behavior.

📊 Summary

Severity	Count	Issues
🔴 CRITICAL	0	—
🟡 MEDIUM	0	—
🟢 LOW	2	repeat() helper, silent no-op documentation

💡 Recommendation

🟢 APPROVED

Solid feature, live verified, ~75% token cost reduction cho Qwen models. Clean backward compatibility, good test coverage, runtime escape hatch.

Wait for CI go:complete before merging!

Great work @nguyennguyenit — đây là PR tiết kiệm tiền thật sự! 💰🚀

- Use strings.Repeat instead of custom repeat() helper in smoke test - Clarify BuildRequestBodyForTest is test-only, not public API

chore(providers): polish per PR #1127 review

6d7637f

- Use strings.Repeat instead of custom repeat() helper in smoke test - Clarify BuildRequestBodyForTest is test-only, not public API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(providers): explicit prompt cache for DashScope/Qwen#1127

feat(providers): explicit prompt cache for DashScope/Qwen#1127
nguyennguyenit wants to merge 2 commits intodevfrom
feat/qwen-dashscope-prompt-cache

nguyennguyenit commented May 8, 2026

Uh oh!

clark-cant commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nguyennguyenit commented May 8, 2026

Summary

Changes

Live Validation

Backward Compatibility

Test plan

Estimated Impact

Uh oh!

clark-cant commented May 8, 2026

🔍 Code Review — feat(providers): explicit prompt cache for DashScope/Qwen

🎯 Tổng quan

🤖 CI Status + Merge Conflicts

✅ Điểm Tốt

🟡 MEDIUM

🟢 LOW

📊 Summary

💡 Recommendation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants