refactor(proxy): audit thinking-mode protocol and refactor test suite by yxlao · Pull Request #33 · yxlao/deepseek-cursor-proxy

yxlao · 2026-05-01T11:03:31Z

TL;DR

This PR audits the proxy against DeepSeek's official thinking-mode tool-call protocol and confirms that the multi-round reasoning_content chain is reconstructed and forwarded correctly across tool-call turns. Along the way it tightens a few protocol edges, fixes minor bugs, and consolidates the test suite.

Summary

Tightens the thinking-mode protocol behavior based on a code audit, and refactors the test suite around a single end-to-end protocol harness so each scenario lives in exactly one place.

Changes Made

Protocol fixes

src/deepseek_cursor_proxy/transform.py:
- Added strip_recovery_notice_for_upstream() so the proxy-generated recovery notice is no longer echoed back to DeepSeek. The notice still serves as a boundary marker in the with-prefix history (preserved for cache-scope alignment), but is stripped before the request leaves the proxy.
- Removed LEGACY_RECOVERY_NOTICE_TEXT constant and its has_recovery_notice() branch — only the current recovery notice is recognized now.
- Added WARNING logs for unsupported request fields (parallel_tool_calls, service_tier, etc.) that the proxy silently drops, and for non-DeepSeek model names that get rewritten to the configured upstream model.
- Reordered rewrite_response_body(): prefix → record reasoning → fold reasoning, so cache scope signatures still match what Cursor will echo back.
src/deepseek_cursor_proxy/streaming.py: Added fold_reasoning_into_content() for non-streaming responses, mirroring reasoning_content into a <details><summary>Thinking</summary>...</details> block (or <think> if collapsible_reasoning=False) — the streaming adapter already did this; non-streaming responses now match.
src/deepseek_cursor_proxy/config.py + server.py: Removed the pass-through thinking mode. The proxy was the only thing that knew how to rebuild missing reasoning_content; pass-through bypassed it and produced 400s upstream.

Test suite refactor

Consolidated overlapping tests into one canonical end-to-end file, tests/test_protocol.py, which uses a StrictFakeDeepSeek HTTP handler that 400s on missing reasoning_content (the same behavior the real DeepSeek API has). Test classes cover: canonical loop, strict-reject mode, thinking-disabled, recovery, streaming-then-non-streaming, streaming/non-streaming display, concurrent threads, and streaming cache timing.
Removed tests/test_proxy_end_to_end.py (1414 lines) — its scenarios now live in test_protocol.py.
Trimmed tests/test_transform.py from 1489 → 321 lines: kept only pure-helper unit tests (content extraction, request prep, recovery-notice stripping, response rewrite). Behavior that needs an upstream now lives in test_protocol.py.
Expanded tests/test_server.py with HTTP-boundary tests: bearer token forwarding, oversized body rejection, healthz, streaming close-after-[DONE], normal vs verbose logging.
Expanded tests/test_trace.py with three integration tests that exercise the trace writer through a running proxy (non-streaming replay, streaming chunks, recovery diagnostics).
Added tests/test_streaming.py::FoldReasoningTests for the new non-streaming fold helper.

Final layout: 80 tests across 9 files (~3500 lines), down from 98 tests across 10 files (~5000 lines).

Repo cleanup

Removed docs/audit_report.md and docs/thinking-tools.md — internal audit notes, not user-facing docs.
Removed scripts/audit_deepseek_protocol.py — the mock DeepSeek server is now part of tests/test_protocol.py.
Removed .claude/settings.local.json and added .claude/ to .gitignore.
Dropped the now-obsolete scripts/audit_deepseek_protocol.py ruff per-file ignore from pyproject.toml.

Breaking Changes

--thinking pass-through is no longer supported. Configs that specified pass-through will fall back to the default (enabled). The mode bypassed the proxy's reasoning-cache patching, which is the proxy's reason for existing.
The legacy recovery-notice prefix ("Note: recovered this DeepSeek chat...") is no longer recognized as a boundary marker. Live conversations that still carry the old prefix in their history will have it treated as ordinary assistant content. New recoveries always emit the current [deepseek-cursor-proxy] Refreshed reasoning_content history. prefix.

Test plan

uv run python -m unittest discover -s tests — 80 tests pass (1 skipped: live DeepSeek API test, requires real API key)
Manual: run the proxy against Cursor with a multi-turn tool-call conversation, verify no 400s
Manual: kill and restart the proxy mid-conversation, verify the recovery notice appears once and subsequent turns succeed

Allowlists `user`, `seed`, `n`, and `logit_bias` so the proxy stops log-spamming when Cursor (and any OpenAI-SDK client) sends them on every request. DeepSeek either honors or safely ignores these; the warning was telling users we were silently dropping a stable end-user identifier they probably want forwarded.

PR #33's test refactor trimmed test_transform.py from 1489 to 321 lines and accidentally dropped five regression tests that locked in PR #28's cross-mode/model context-preservation mechanisms (Pro/Flash family normalization, portable turn-scoped keys, recovery-boundary continuation). The production code is intact, but coverage was gone. Restore the originals verbatim from commit 5f14da3 as a new CrossModeAndModelTests class: - test_deepseek_pro_and_flash_share_reasoning_namespace - test_strict_hit_backfills_portable_cache_for_mode_switch - test_portable_turn_cache_restores_final_assistant_after_tool_result - test_portable_turn_cache_isolated_for_reused_tool_call_id - test_recovered_response_is_recorded_under_pre_recovery_scope Adjusted only the imports and the namespace/scope helpers to fit the post-PR-#33 layout. All 87 tests pass.

Merges 6 upstream commits from yxlao/deepseek-cursor-proxy: - feat(streaming): add collapsible reasoning display (yxlao#32) - refactor(proxy): audit thinking-mode protocol and refactor test suite (yxlao#33) - fix(server): honor missing-reasoning reject mode (yxlao#34) - fix: prevent recovery cascade and improve Stop-scenario reasoning lookup (yxlao#25) - feat(config): default reasoning effort to max (yxlao#36) - refactor(logging): simplified prints and add spinner (yxlao#37) Combines both branches' changes: - Concurrent request deduplication (ours) + dual-scope recording (upstream) - Broad namespace cache keys (ours) + portable tool_name keys (upstream) - Spinner, build_arg_parser, main() function (upstream) - Metrics tracking, schema migration, graceful shutdown (ours) Co-authored-by: Cursor <cursoragent@cursor.com>

yxlao added 17 commits May 1, 2026 17:29

docs(thinking-mode): add thinking mode tool-call flow doc

3ce420c

docs(thinking-mode): expand tool call flow docs

e033fb4

docs(audit): add protocol audit reports and script

1c44f13

test(audit): add protocol compliance harness and report

f1176d9

docs(audit): consolidate audit reports into one

cecf1f8

docs(docs): rename thinking doc and update refs

2b2d02d

Merge remote flow review docs

91f84cb

docs(readme): trim core fix overview

2a0a19c

docs(readme): expand how-it-works section

3bf6cf5

docs(audit): fix markdown formatting

47ff6d5

docs(audit_report): condense to summary

c5f98ed

docs(audit): document review issues and fixes

338691e

refactor(thinking): drop pass-through, add display

3602b6c

test(tests): rename protocol compliance test

bed2d44

refactor(tests): merge proxy e2e tests into protocol

0566202

refactor(repo): clean up audit and claude files

e940e01

fix(gitignore): move claude entry to ai section

f6a89ff

yxlao changed the title ~~wip: workflow review~~ refactor(proxy): audit thinking-mode protocol and refactor test suite May 1, 2026

fix(tests): poll for stats log before assert

0215269

yxlao force-pushed the yixing/flow-review branch from 77619cf to 0215269 Compare May 1, 2026 11:36

yxlao merged commit be03107 into main May 1, 2026
6 checks passed

yxlao deleted the yixing/flow-review branch May 1, 2026 11:48

yxlao mentioned this pull request May 1, 2026

fix(server): honor missing-reasoning reject mode #34

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(proxy): audit thinking-mode protocol and refactor test suite#33

refactor(proxy): audit thinking-mode protocol and refactor test suite#33
yxlao merged 19 commits into
mainfrom
yixing/flow-review

yxlao commented May 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yxlao commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TL;DR

Summary

Changes Made

Protocol fixes

Test suite refactor

Repo cleanup

Breaking Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yxlao commented May 1, 2026 •

edited

Loading