Skip to content

Add near-complete Codex VSCode Support, full OAI Responses bridge#3

Open
michaelw9999 wants to merge 2 commits into
michaelw9999:full-openai-responsesfrom
krystophny:master
Open

Add near-complete Codex VSCode Support, full OAI Responses bridge#3
michaelw9999 wants to merge 2 commits into
michaelw9999:full-openai-responsesfrom
krystophny:master

Conversation

@michaelw9999
Copy link
Copy Markdown
Owner

Things brings in automatic compaction, web_search and file_search and is super easy to configure, for example:

model = "qwen3.5-4B-NVFP4"
model_provider = "llamacpp"
personality = "friendly"
model_context_window = 128000
model_auto_compact_token_limit = 100000
model_supports_reasoning_summaries = true
model_reasoning_summary = "auto"
model_reasoning_effort = "medium"

[model_providers.llamacpp]
name = "Local llama.cpp"
model = "Qwen3.5-4B-NVFP4.gguf"
base_url = "http://192.168.50.50:43901/v1"
supports_websockets = false

[model_providers.llamacpp.http_headers]
X-Llama-Responses-Web-Search-Wrapper = "tvly"
X-Llama-Responses-File-Search-Wrapper = "rg"
X-Llama-Responses-Reasoning-Budget-Tokens = "minimal=2048,low=4096,medium=8192,high=16384,xhigh=32768"

For the automatic compaction to work, you must set model_context_window and model_auto_compact_token_limit. Summary boxes and clickable diffs with the undo button ususally need model_supports_reasoning_summaries = true and model_reasoning_summary = "auto".
Just install tavily (but shell command is tvly) and rg or any other preferred web search MCP or file search/locator tool, it will wrap it through the shell and integrate it more natively and intuitively. If left out, it will hide these tools from the model.

krystophny and others added 2 commits March 30, 2026 18:35
Codex CLI compatibility:
- Skip non-function tool types (web_search, code_interpreter)
- Merge developer/system messages into position 0 for Qwen templates
- Strip Responses-only request keys (store, include, prompt_cache_key)
- Restore refusal content type handling

Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted):
- Add 24 missing Response object fields per OpenAI spec
- Fix function_call id/call_id field mapping
- Add sequence_number, output_index, content_index to ALL streaming events
- Full response object in response.created/in_progress events
- Accept input_text type and EasyInputMessage for multi-turn input
- output_text convenience field, output_tokens_details

14 pytest tests, E2E tested with async OpenAI SDK and Codex CLI.

Refs: ggml-org#19138, ggml-org#19720, ggml-org#21174
Cherry-pick of ggml-org#20819 by European-tech.

Persist context checkpoints in a companion .checkpoints file alongside
slot saves. Without this, restoring a slot for hybrid/recurrent models
triggers full prompt reprocessing (23s for 26K tokens on Qwen3.5-27B).
With checkpoint persistence, restore takes 75ms.

Binary format with magic 0x4C4C4350 ("LLCP"), versioned, backward
compatible (old saves without companion file load normally).
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d72b0819db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

{"sequence_number", seq_num++},
{"output_index", output_idx++},
{"item", json {
{"id", oai_resp_fc_item_id},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Emit a fresh function-call item id for each added tool call

server_task_result_cmpl_partial::update() only assigns state.oai_resp_fc_item_id after snapshotting state into the chunk fields, so to_json_oaicompat_resp() can emit response.output_item.added with {"id": oai_resp_fc_item_id} from the previous value (often empty on the first streamed tool call). This makes streamed response.function_call_arguments.delta.item_id/final output_item.done.item.id inconsistent with the announced item, which breaks clients that stitch function-call argument deltas by item_id.

Useful? React with 👍 / 👎.

Comment on lines +487 to +489
if (checkpoints.empty()) {
return true;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove stale checkpoint sidecar when no checkpoints exist

When checkpoints is empty, slot_checkpoints_save() returns without touching <filepath>.checkpoints, so reusing the same save filename can leave an old sidecar file behind. A later restore will then load stale checkpoint metadata for a different KV snapshot, which can trigger invalid recurrent-state restore attempts or unnecessary full prompt reprocessing.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants