Skip to content

refactor(proxy): audit thinking-mode protocol and refactor test suite#33

Merged
yxlao merged 19 commits into
mainfrom
yixing/flow-review
May 1, 2026
Merged

refactor(proxy): audit thinking-mode protocol and refactor test suite#33
yxlao merged 19 commits into
mainfrom
yixing/flow-review

Conversation

@yxlao
Copy link
Copy Markdown
Owner

@yxlao yxlao commented May 1, 2026

TL;DR

This PR audits the proxy against DeepSeek's official thinking-mode tool-call protocol and confirms that the multi-round reasoning_content chain is reconstructed and forwarded correctly across tool-call turns. Along the way it tightens a few protocol edges, fixes minor bugs, and consolidates the test suite.

DeepSeek thinking with tools

Summary

Tightens the thinking-mode protocol behavior based on a code audit, and refactors the test suite around a single end-to-end protocol harness so each scenario lives in exactly one place.

Changes Made

Protocol fixes

  • src/deepseek_cursor_proxy/transform.py:
    • Added strip_recovery_notice_for_upstream() so the proxy-generated recovery notice is no longer echoed back to DeepSeek. The notice still serves as a boundary marker in the with-prefix history (preserved for cache-scope alignment), but is stripped before the request leaves the proxy.
    • Removed LEGACY_RECOVERY_NOTICE_TEXT constant and its has_recovery_notice() branch — only the current recovery notice is recognized now.
    • Added WARNING logs for unsupported request fields (parallel_tool_calls, service_tier, etc.) that the proxy silently drops, and for non-DeepSeek model names that get rewritten to the configured upstream model.
    • Reordered rewrite_response_body(): prefix → record reasoning → fold reasoning, so cache scope signatures still match what Cursor will echo back.
  • src/deepseek_cursor_proxy/streaming.py: Added fold_reasoning_into_content() for non-streaming responses, mirroring reasoning_content into a <details><summary>Thinking</summary>...</details> block (or <think> if collapsible_reasoning=False) — the streaming adapter already did this; non-streaming responses now match.
  • src/deepseek_cursor_proxy/config.py + server.py: Removed the pass-through thinking mode. The proxy was the only thing that knew how to rebuild missing reasoning_content; pass-through bypassed it and produced 400s upstream.

Test suite refactor

  • Consolidated overlapping tests into one canonical end-to-end file, tests/test_protocol.py, which uses a StrictFakeDeepSeek HTTP handler that 400s on missing reasoning_content (the same behavior the real DeepSeek API has). Test classes cover: canonical loop, strict-reject mode, thinking-disabled, recovery, streaming-then-non-streaming, streaming/non-streaming display, concurrent threads, and streaming cache timing.
  • Removed tests/test_proxy_end_to_end.py (1414 lines) — its scenarios now live in test_protocol.py.
  • Trimmed tests/test_transform.py from 1489 → 321 lines: kept only pure-helper unit tests (content extraction, request prep, recovery-notice stripping, response rewrite). Behavior that needs an upstream now lives in test_protocol.py.
  • Expanded tests/test_server.py with HTTP-boundary tests: bearer token forwarding, oversized body rejection, healthz, streaming close-after-[DONE], normal vs verbose logging.
  • Expanded tests/test_trace.py with three integration tests that exercise the trace writer through a running proxy (non-streaming replay, streaming chunks, recovery diagnostics).
  • Added tests/test_streaming.py::FoldReasoningTests for the new non-streaming fold helper.

Final layout: 80 tests across 9 files (~3500 lines), down from 98 tests across 10 files (~5000 lines).

Repo cleanup

  • Removed docs/audit_report.md and docs/thinking-tools.md — internal audit notes, not user-facing docs.
  • Removed scripts/audit_deepseek_protocol.py — the mock DeepSeek server is now part of tests/test_protocol.py.
  • Removed .claude/settings.local.json and added .claude/ to .gitignore.
  • Dropped the now-obsolete scripts/audit_deepseek_protocol.py ruff per-file ignore from pyproject.toml.

Breaking Changes

  • --thinking pass-through is no longer supported. Configs that specified pass-through will fall back to the default (enabled). The mode bypassed the proxy's reasoning-cache patching, which is the proxy's reason for existing.
  • The legacy recovery-notice prefix ("Note: recovered this DeepSeek chat...") is no longer recognized as a boundary marker. Live conversations that still carry the old prefix in their history will have it treated as ordinary assistant content. New recoveries always emit the current [deepseek-cursor-proxy] Refreshed reasoning_content history. prefix.

Test plan

  • uv run python -m unittest discover -s tests — 80 tests pass (1 skipped: live DeepSeek API test, requires real API key)
  • Manual: run the proxy against Cursor with a multi-turn tool-call conversation, verify no 400s
  • Manual: kill and restart the proxy mid-conversation, verify the recovery notice appears once and subsequent turns succeed

@yxlao yxlao changed the title wip: workflow review refactor(proxy): audit thinking-mode protocol and refactor test suite May 1, 2026
@yxlao yxlao force-pushed the yixing/flow-review branch from 77619cf to 0215269 Compare May 1, 2026 11:36
Allowlists `user`, `seed`, `n`, and `logit_bias` so the proxy stops
log-spamming when Cursor (and any OpenAI-SDK client) sends them on
every request. DeepSeek either honors or safely ignores these; the
warning was telling users we were silently dropping a stable end-user
identifier they probably want forwarded.
@yxlao yxlao merged commit be03107 into main May 1, 2026
6 checks passed
@yxlao yxlao deleted the yixing/flow-review branch May 1, 2026 11:48
yxlao added a commit that referenced this pull request May 1, 2026
PR #33's test refactor trimmed test_transform.py from 1489 to 321 lines
and accidentally dropped five regression tests that locked in PR #28's
cross-mode/model context-preservation mechanisms (Pro/Flash family
normalization, portable turn-scoped keys, recovery-boundary
continuation). The production code is intact, but coverage was gone.

Restore the originals verbatim from commit 5f14da3 as a new
CrossModeAndModelTests class:

- test_deepseek_pro_and_flash_share_reasoning_namespace
- test_strict_hit_backfills_portable_cache_for_mode_switch
- test_portable_turn_cache_restores_final_assistant_after_tool_result
- test_portable_turn_cache_isolated_for_reused_tool_call_id
- test_recovered_response_is_recorded_under_pre_recovery_scope

Adjusted only the imports and the namespace/scope helpers to fit the
post-PR-#33 layout. All 87 tests pass.
maThiaslI152 added a commit to maThiaslI152/deepseek-cursor-proxy that referenced this pull request May 6, 2026
Merges 6 upstream commits from yxlao/deepseek-cursor-proxy:
- feat(streaming): add collapsible reasoning display (yxlao#32)
- refactor(proxy): audit thinking-mode protocol and refactor test suite (yxlao#33)
- fix(server): honor missing-reasoning reject mode (yxlao#34)
- fix: prevent recovery cascade and improve Stop-scenario reasoning lookup (yxlao#25)
- feat(config): default reasoning effort to max (yxlao#36)
- refactor(logging): simplified prints and add spinner (yxlao#37)

Combines both branches' changes:
- Concurrent request deduplication (ours) + dual-scope recording (upstream)
- Broad namespace cache keys (ours) + portable tool_name keys (upstream)
- Spinner, build_arg_parser, main() function (upstream)
- Metrics tracking, schema migration, graceful shutdown (ours)

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant