feat: add openai flex processing toggle#1465
Open
tasoo-oos wants to merge 3 commits into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Checklist
Summary
Add an Advanced Settings toggle for OpenAI Flex Processing on official OpenAI Chat Completions requests.
Related Issues
None.
Changes
openAIFlexProcessingsetting.service_tier: flexonly for official OpenAI Chat Completions requests when enabled.Impact
Users can opt into lower-cost OpenAI Flex responses, which may be slower than regular responses.
Other providers such as OpenRouter and Responses API requests are unchanged.
Additional Notes
Tested with
pnpm checkand manually with risu-official GPT endpoint / Custom API withhttps://api.openai.com/v1.Also, as described at the pricing document, only o3, o4-mini, and GPT-5 model family (except for -Chat variant) can use this feature.
I could have implemented a model-wise guard with
LLMFlags, but I haven't.This is because the end user is given a pretty clear and explicit error when using the wrong model with the flex feature, but implementing the
LLMFlags-based guard could cause end users who mistakenly do not set up that flag to pay double the price silently. (Custom API for unregistered OpenAI model, etc.)If you have a better idea, I am open to suggestions!
UI
Footnotes
Modifies the behavior of prompting, requesting, or handling responses from AI models. ↩
Over 80% of the code is AI generated. ↩