Open
Conversation
| if delta.content: | ||
| stream_content_parts.append(delta.content) | ||
| if delta.tool_calls: | ||
| stream_tool_calls = delta.tool_calls |
There was a problem hiding this comment.
Bug: Streaming tool calls overwritten instead of accumulated
In test_streaming_output_consistency, the streaming tool calls handling overwrites stream_tool_calls on each chunk with stream_tool_calls = delta.tool_calls instead of accumulating deltas. OpenAI's streaming API returns tool calls as incremental deltas that need to be merged by index across multiple chunks. The current code only preserves the last chunk's tool call data, causing the tool call comparison at line 2162 to compare incomplete data against the non-streaming response, potentially producing false positives or false negatives.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
name: Pull Request
about: Propose changes to the codebase
title: "Brief description of changes"
labels: ''
assignees: ''
Description
Please include a summary of the change and which issue is fixed or feature is implemented. Please also include relevant motivation and context. List any dependencies that are required for this change.
Fixes # (issue)
Implements # (issue)
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration.
Test Configuration:
Checklist:
black .,isort .,flake8 .)Screenshots (if applicable)
If applicable, add screenshots to help showcase your changes.
Additional context
Add any other context about the PR here.
Note
Add comprehensive streaming compliance benchmark and workflow, and enhance rollout/models to record finish_reason/tool_call_count and reasoning/tool-call data.
eval_protocol/benchmarks/test_glm_streaming_compliance.pywith streaming and non-streaming tests for:/.github/workflows/streaming_compliance.ymlto run the benchmark (configurable inputs) and upload JSON artifacts.eval_protocol/pytest/default_single_turn_rollout_process.pyto:reasoning_effortviaextra_body, disable cache per request.reasoning_contentand normalizedtool_callsto assistantMessage.row.execution_metadata.finish_reasonandtool_call_count.ExecutionMetadatawithfinish_reasonandtool_call_countfields.Written by Cursor Bugbot for commit 688e87f. This will update automatically on new commits. Configure here.