Surface native reasoning from openai-compatible streaming responses by simpolism · Pull Request #129 · anima-research/animachat

simpolism · 2026-06-03T00:43:15Z

The openai-compatible streaming parser only read delta.content and post-hoc <think> tags from the final text. OpenAI-compatible servers that run a reasoning parser (e.g. vLLM) stream chain-of-thought in a separate delta.reasoning / delta.reasoning_content field, with delta.content null until the reasoning phase ends. Those reasoning deltas were silently dropped, so extended thinking never reached the client even though the model emitted it.

Fix

Accumulate native reasoning deltas into a thinking content block (kept at index 0) and stream it via onChunk, then merge with any inline <think> blocks parsed at completion. This mirrors the reasoning-passthrough already implemented in the OpenRouter service (reasoning_content preferred, reasoning string as fallback).

Models that inline <think>...</think> in content are unaffected — that path still runs at [DONE] via parseThinkingTags, and the two sources are merged (native reasoning at index 0, inline blocks appended).

Files

backend/src/services/openai-compatible.ts

The openai-compatible streaming parser only read delta.content and post-hoc <think> tags from the final text. OpenAI-compatible servers that run a reasoning parser (e.g. vLLM) stream chain-of-thought in a separate delta.reasoning / delta.reasoning_content field, with delta.content null until the reasoning phase ends. Those reasoning deltas were silently dropped, so extended thinking never reached the client even though the model emitted it. Accumulate native reasoning deltas into a thinking content block (index 0) and stream it via onChunk, then merge with any inline <think> blocks at completion. Mirrors the reasoning-passthrough already implemented in the OpenRouter service.

greptile-apps · 2026-06-03T00:48:20Z

Greptile Summary

This PR surfaces native chain-of-thought from OpenAI-compatible streaming responses by reading delta.reasoning_content / delta.reasoning fields that servers like vLLM emit during the reasoning phase — deltas that were previously silently dropped. The approach mirrors the existing OpenRouter reasoning-passthrough and merges native blocks with any inline <think> blocks at stream completion.

Introduces three state variables (reasoningBlocks, reasoningContent, hasReasoningStarted) to track streaming thinking content, streams it via onChunk at each delta, and merges it with post-hoc parseThinkingTags output at [DONE].
Content chunks during and after the reasoning phase pass the accumulated reasoningBlocks array alongside each text delta, keeping the client's in-progress thinking block up-to-date throughout streaming.

Confidence Score: 4/5

Safe to merge; the change is additive and isolated to the streaming parser, with no impact on models that don't emit native reasoning fields.

The implementation correctly follows the pattern already proven in the OpenRouter service. Minor gaps remain: the reasoning field is only handled as a string, and a model emitting reasoning in both the native field and inline tags could produce duplicate thinking blocks.

deprecated-claude-app/backend/src/services/openai-compatible.ts — specifically the reasoning field fallback parsing and the [DONE] merge logic.

Important Files Changed

Filename	Overview
deprecated-claude-app/backend/src/services/openai-compatible.ts	Adds native reasoning-delta passthrough; core logic correctly mirrors OpenRouter's implementation, but `reasoning` array/object formats aren't handled and concurrent native+inline reasoning could produce duplicate thinking blocks.

Sequence Diagram

sequenceDiagram
    participant vLLM as vLLM / OpenAI-Compatible Server
    participant Service as openai-compatible.ts
    participant onChunk as onChunk Handler

    vLLM->>Service: SSE delta (reasoning_content: "Let me think...")
    Service->>Service: "reasoningContent += text"
    Service->>Service: "reasoningBlocks[0] = {type:'thinking', thinking: reasoningContent}"
    Service->>onChunk: onChunk('', false, reasoningBlocks)

    vLLM->>Service: SSE delta (reasoning_content: " more thinking")
    Service->>Service: "reasoningContent += text"
    Service->>Service: reasoningBlocks[0] updated
    Service->>onChunk: onChunk('', false, reasoningBlocks)

    vLLM->>Service: SSE delta (content: "Hello!")
    Service->>Service: "fullContent += text"
    Service->>onChunk: onChunk('Hello!', false, reasoningBlocks)

    vLLM->>Service: SSE data: [DONE]
    Service->>Service: "inlineBlocks = parseThinkingTags(fullContent)"
    Service->>Service: "contentBlocks = [...reasoningBlocks, ...inlineBlocks]"
    Service->>onChunk: onChunk('', true, contentBlocks)

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
deprecated-claude-app/backend/src/services/openai-compatible.ts:158-160
**Narrower `reasoning` handling than OpenRouter reference**

The OpenRouter service handles `delta.reasoning` as a string, an array of `{text: string}` objects, or a plain object with a `.text` property. This service only handles the string form. Any OpenAI-compatible server that emits `delta.reasoning` in array or object format (e.g. servers following a newer streaming spec) would silently drop that reasoning content. Aligning with OpenRouter's multi-format handling would future-proof this code.

### Issue 2 of 3
deprecated-claude-app/backend/src/services/openai-compatible.ts:164
`reasoningBlocks.unshift()` on an empty array is functionally identical to `push()` here, but `unshift` implies prepending ahead of existing elements, which is misleading when the array is empty. Using `push` communicates intent more clearly.

```suggestion
                  reasoningBlocks.push({ type: 'thinking', thinking: '' });
```

### Issue 3 of 3
deprecated-claude-app/backend/src/services/openai-compatible.ts:140-145
**Duplicate thinking blocks possible when server emits both fields**

If a server emits native reasoning via `delta.reasoning_content` AND also wraps that same content in `<think>...</think>` inside `delta.content`, `parseThinkingTags(fullContent)` at `[DONE]` would extract those inline tags into `inlineBlocks` and they would be appended to `reasoningBlocks` in the merge. The client would then see the same reasoning content twice — once from the native field and once from the parsed tags. Adding a guard to skip `parseThinkingTags` when `hasReasoningStarted` is true (or when `reasoningBlocks.length > 0`) would prevent this.

_{Reviews (1): Last reviewed commit: "fix: surface native reasoning from opena..." | Re-trigger Greptile}

greptile-apps · 2026-06-03T00:48:23Z

+              } else if (delta?.reasoning && typeof delta.reasoning === 'string') {
+                reasoningText = delta.reasoning;
+              }


Narrower reasoning handling than OpenRouter reference

The OpenRouter service handles delta.reasoning as a string, an array of {text: string} objects, or a plain object with a .text property. This service only handles the string form. Any OpenAI-compatible server that emits delta.reasoning in array or object format (e.g. servers following a newer streaming spec) would silently drop that reasoning content. Aligning with OpenRouter's multi-format handling would future-proof this code.

Prompt To Fix With AI

This is a comment left during a code review. Path: deprecated-claude-app/backend/src/services/openai-compatible.ts Line: 158-160 Comment: **Narrower `reasoning` handling than OpenRouter reference** The OpenRouter service handles `delta.reasoning` as a string, an array of `{text: string}` objects, or a plain object with a `.text` property. This service only handles the string form. Any OpenAI-compatible server that emits `delta.reasoning` in array or object format (e.g. servers following a newer streaming spec) would silently drop that reasoning content. Aligning with OpenRouter's multi-format handling would future-proof this code. How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

greptile-apps · 2026-06-03T00:48:24Z

+              const inlineBlocks = this.parseThinkingTags(fullContent);
+              const contentBlocks = [...reasoningBlocks, ...inlineBlocks];
+              if (hasReasoningStarted) {
+                console.log(`[OpenAI-Compatible] 🧠 Reasoning complete: ${reasoningContent.length} chars`);
+              }
              await onChunk('', true, contentBlocks.length > 0 ? contentBlocks : undefined);


Duplicate thinking blocks possible when server emits both fields

If a server emits native reasoning via delta.reasoning_content AND also wraps that same content in <think>...</think> inside delta.content, parseThinkingTags(fullContent) at [DONE] would extract those inline tags into inlineBlocks and they would be appended to reasoningBlocks in the merge. The client would then see the same reasoning content twice — once from the native field and once from the parsed tags. Adding a guard to skip parseThinkingTags when hasReasoningStarted is true (or when reasoningBlocks.length > 0) would prevent this.

Prompt To Fix With AI

This is a comment left during a code review. Path: deprecated-claude-app/backend/src/services/openai-compatible.ts Line: 140-145 Comment: **Duplicate thinking blocks possible when server emits both fields** If a server emits native reasoning via `delta.reasoning_content` AND also wraps that same content in `<think>...</think>` inside `delta.content`, `parseThinkingTags(fullContent)` at `[DONE]` would extract those inline tags into `inlineBlocks` and they would be appended to `reasoningBlocks` in the merge. The client would then see the same reasoning content twice — once from the native field and once from the parsed tags. Adding a guard to skip `parseThinkingTags` when `hasReasoningStarted` is true (or when `reasoningBlocks.length > 0`) would prevent this. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps Bot reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface native reasoning from openai-compatible streaming responses#129

Surface native reasoning from openai-compatible streaming responses#129
simpolism wants to merge 1 commit into
anima-research:mainfrom
simpolism:jake.openai-compatible-reasoning-passthrough

simpolism commented Jun 3, 2026

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Jun 3, 2026

Uh oh!

greptile-apps Bot Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

simpolism commented Jun 3, 2026

Fix

Files

Uh oh!

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant