-
Notifications
You must be signed in to change notification settings - Fork 95
Add Realtime Reasoning API support for gpt-realtime-2 #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
richarddas
wants to merge
1
commit into
AIProxyTeam:main
Choose a base branch
from
richarddas:feature/realtime-reasoning-api-parity
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| # Realtime API Schema Matrix | ||
|
|
||
| This matrix maps the current OpenAI Realtime `session.update.session` and `response.create.response` | ||
| fields to AIProxySwift types and wire encoding behavior. | ||
|
|
||
| Reference: https://developers.openai.com/api/reference/resources/realtime | ||
|
|
||
| ## Shared Realtime Session | ||
|
|
||
| These fields are used by Performance Realtime models, such as `gpt-realtime-1.5`, and are also the | ||
| base session shape composed by Realtime Reasoning models. | ||
|
|
||
| | Wire field | AIProxySwift API | Wire shape emitted | | ||
| | --- | --- | --- | | ||
| | `type` | `OpenAIRealtimeSessionConfiguration.type` | string | | ||
| | `include` | `OpenAIRealtimeSessionConfiguration.include` | string array | | ||
| | `model` | `OpenAIRealtimeSessionConfiguration.model` | string | | ||
| | `instructions` | `OpenAIRealtimeSessionConfiguration.instructions` | string | | ||
| | `max_output_tokens` | `OpenAIRealtimeSessionConfiguration.maxOutputTokens` | int or `"inf"` | | ||
| | `output_modalities` | `OpenAIRealtimeSessionConfiguration.outputModalities` | enum string array | | ||
| | `prompt` | `OpenAIRealtimeSessionConfiguration.prompt` | object (`id`, optional `variables`, optional `version`) | | ||
| | `tracing` | `OpenAIRealtimeSessionConfiguration.tracing` | string `"auto"` or object (`group_id`, `metadata`, `workflow_name`) | | ||
| | `truncation` | `OpenAIRealtimeSessionConfiguration.truncation` | string (`"auto"`/`"disabled"`) or retention-ratio object | | ||
| | `tools` | `OpenAIRealtimeSessionConfiguration.tools` | union array (`function`, `mcp`, `web_search`) | | ||
| | `tool_choice` | `OpenAIRealtimeSessionConfiguration.toolChoice` | string (`auto`/`none`/`required`) or typed selector object | | ||
| | `audio.input.format` | `OpenAIRealtimeSessionConfiguration.inputAudioFormat` | object (`type`, optional `rate`) | | ||
| | `audio.input.noise_reduction` | `OpenAIRealtimeSessionConfiguration.inputAudioNoiseReduction` | object (`type`) | | ||
| | `audio.input.transcription` | `OpenAIRealtimeSessionConfiguration.inputAudioTranscription` | object (`language`, `model`, `prompt`) | | ||
| | `audio.input.turn_detection` | `OpenAIRealtimeSessionConfiguration.turnDetection` | typed object union (`server_vad` / `semantic_vad`) | | ||
| | `audio.output.format` | `OpenAIRealtimeSessionConfiguration.outputAudioFormat` | object (`type`, optional `rate`) | | ||
| | `audio.output.speed` | `OpenAIRealtimeSessionConfiguration.speed` | number (range 0.25...1.5) | | ||
| | `audio.output.voice` | `OpenAIRealtimeSessionConfiguration.voice` | string or object (`id`) | | ||
|
|
||
| ## Realtime Reasoning Session | ||
|
|
||
| Realtime Reasoning models, such as `gpt-realtime-2`, compose the shared session fields above and add | ||
| Reasoning-only fields to the same `session.update.session` object. | ||
|
|
||
| | Wire field | AIProxySwift API | Wire shape emitted | | ||
| | --- | --- | --- | | ||
| | `reasoning` | `OpenAIRealtimeReasoningSessionConfiguration.reasoning` | object | | ||
| | `reasoning.effort` | `OpenAIRealtimeReasoningConfiguration.effort` | `minimal`, `low`, `medium`, `high`, or `xhigh` | | ||
| | `parallel_tool_calls` | `OpenAIRealtimeReasoningSessionConfiguration.parallelToolCalls` | boolean | | ||
|
|
||
| ## Shared `response.create` | ||
|
|
||
| | Wire field | AIProxySwift API | Wire shape emitted | | ||
| | --- | --- | --- | | ||
| | `type` | `OpenAIRealtimeResponseCreate.type` | `"response.create"` | | ||
| | `event_id` | `OpenAIRealtimeResponseCreate.eventID` | optional string | | ||
| | `response.instructions` | `OpenAIRealtimeResponseCreate.Response.instructions` | optional string | | ||
| | `response.output_modalities` | `OpenAIRealtimeResponseCreate.Response.outputModalities` | optional enum string array | | ||
| | `response.tools` | `OpenAIRealtimeResponseCreate.Response.tools` | optional tool union array (`function`, `mcp`, `web_search`) | | ||
| | `response.tool_choice` | `OpenAIRealtimeResponseCreate.Response.toolChoice` | optional string/object union | | ||
|
|
||
| ## Realtime Reasoning `response.create` | ||
|
|
||
| | Wire field | AIProxySwift API | Wire shape emitted | | ||
| | --- | --- | --- | | ||
| | `type` | `OpenAIRealtimeReasoningResponseCreate.type` | `"response.create"` | | ||
| | `event_id` | `OpenAIRealtimeReasoningResponseCreate.eventID` | optional string | | ||
| | `response.reasoning` | `OpenAIRealtimeReasoningResponseCreate.Response.reasoning` | object | | ||
| | `response.reasoning.effort` | `OpenAIRealtimeReasoningConfiguration.effort` | `minimal`, `low`, `medium`, `high`, or `xhigh` | | ||
| | `response.parallel_tool_calls` | `OpenAIRealtimeReasoningResponseCreate.Response.parallelToolCalls` | boolean | | ||
|
|
||
| ## Realtime Reasoning Output Phases | ||
|
|
||
| Realtime Reasoning output can be split into commentary and final answer phases. | ||
|
|
||
| | Wire field | AIProxySwift API | Wire shape decoded | | ||
| | --- | --- | --- | | ||
| | `response.output[].phase` | `OpenAIRealtimeResponseOutputItem.phase` | `commentary` or `final_answer` | | ||
| | `response.output_item.*.item.phase` | `OpenAIRealtimeResponseOutputItemAddedEvent.phase` / `OpenAIRealtimeResponseOutputItemDoneEvent.phase` | `commentary` or `final_answer` | | ||
| | `conversation.item.*.item.phase` | `OpenAIRealtimeConversationItemCreatedEvent.phase` | `commentary` or `final_answer` | | ||
|
|
||
| ## `conversation.item.create` | ||
|
|
||
| Reference: https://platform.openai.com/docs/api-reference/realtime-client-events/conversation/item/create | ||
|
|
||
| | Wire field | AIProxySwift API | Wire shape emitted | | ||
| | --- | --- | --- | | ||
| | `type` | `OpenAIRealtimeConversationItemCreate.type` | `"conversation.item.create"` | | ||
| | `item.type` | `OpenAIRealtimeConversationItemCreate.Item` | `"message"`, `"function_call"`, `"function_call_output"` | | ||
| | `item.role` | `OpenAIRealtimeConversationItemCreate.Item.role` | optional string for message items | | ||
| | `item.content[].type` | `OpenAIRealtimeConversationItemCreate.Item.Content.type` | `input_text`, `output_text`, `input_audio`, `item_reference`, `input_image` | | ||
| | `item.content[].text` | `OpenAIRealtimeConversationItemCreate.Item.Content.text` | optional string | | ||
| | `item.content[].audio` | `OpenAIRealtimeConversationItemCreate.Item.Content.audio` | optional string | | ||
| | `item.content[].item_id` | `OpenAIRealtimeConversationItemCreate.Item.Content.itemID` | optional string | | ||
| | `item.call_id` | `OpenAIRealtimeConversationItemCreate.Item.callID` | optional string | | ||
| | `item.name` | `OpenAIRealtimeConversationItemCreate.Item.name` | optional string | | ||
| | `item.arguments` | `OpenAIRealtimeConversationItemCreate.Item.arguments` | optional string | | ||
| | `item.output` | `OpenAIRealtimeConversationItemCreate.Item.output` | optional string | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
24 changes: 24 additions & 0 deletions
24
Sources/AIProxy/OpenAI/OpenAIRealtimeReasoningConfiguration.swift
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| // | ||
| // OpenAIRealtimeReasoningConfiguration.swift | ||
| // AIProxy | ||
| // | ||
|
|
||
| /// Configuration for OpenAI Realtime Reasoning models such as `gpt-realtime-2`. | ||
| nonisolated public struct OpenAIRealtimeReasoningConfiguration: Encodable, Sendable { | ||
| /// Constrains effort on Realtime Reasoning models. | ||
| public let effort: Effort? | ||
|
|
||
| public init(effort: Effort? = nil) { | ||
| self.effort = effort | ||
| } | ||
| } | ||
|
|
||
| extension OpenAIRealtimeReasoningConfiguration { | ||
| nonisolated public enum Effort: String, Encodable, Sendable { | ||
| case minimal | ||
| case low | ||
| case medium | ||
| case high | ||
| case xhigh | ||
| } | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main thing I'd like to understand before merging is if we need this separate ReasoningConfiguration, and separate initializer in
OpenAIRealtimeSession. IIUC, a more surgical change would be to modifyOpenAIRealtimeSessionConfigurationby adding a member:let reasoning: OpenAIRealtimeReasoning?.The
OpenAIRealtimeReasoningtype would have a single member, effort, much like your current typeOpenAIRealtimeReasoningConfiguration.I don't see any real control flow or network sequencing differences between reasoning and non-reasoning versions right now, so I think this would be a simpler change. Let me know if I'm missing something @richarddas
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And a nit: For any new types that you do create, can you use one file per public type and pull them into a new folder OpenAI/Realtime (you can see the existing example of OpenAI/Conversations). I want to start organizing up realtime files for our eventual split of this repo into several single purpose clients. That will make the work down the road a bit easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original thinking was around keeping Performance and Reasoning explicit at the callsite, but you make a solid point that the rest of the sequencing collapses the two anyway. Since models are provided as strings, the wrapper also doesn’t actually enforce that
gpt-realtime-2uses the Reasoning config. So the wrapper is probably overkill.I’ll fold
reasoningandparallelToolCallsinto the existing session and response-create types, while keepingreasoningas a grouped value so the Reasoning intent is still explicit at the callsite.