Add near-complete Codex VSCode Support, full OAI Responses bridge by michaelw9999 · Pull Request #3 · michaelw9999/llama.cpp

michaelw9999 · 2026-05-03T08:57:36Z

Things brings in automatic compaction, web_search and file_search and is super easy to configure, for example:

model = "qwen3.5-4B-NVFP4"
model_provider = "llamacpp"
personality = "friendly"
model_context_window = 128000
model_auto_compact_token_limit = 100000
model_supports_reasoning_summaries = true
model_reasoning_summary = "auto"
model_reasoning_effort = "medium"

[model_providers.llamacpp]
name = "Local llama.cpp"
model = "Qwen3.5-4B-NVFP4.gguf"
base_url = "http://192.168.50.50:43901/v1"
supports_websockets = false

[model_providers.llamacpp.http_headers]
X-Llama-Responses-Web-Search-Wrapper = "tvly"
X-Llama-Responses-File-Search-Wrapper = "rg"
X-Llama-Responses-Reasoning-Budget-Tokens = "minimal=2048,low=4096,medium=8192,high=16384,xhigh=32768"

For the automatic compaction to work, you must set model_context_window and model_auto_compact_token_limit. Summary boxes and clickable diffs with the undo button ususally need model_supports_reasoning_summaries = true and model_reasoning_summary = "auto".
Just install tavily (but shell command is tvly) and rg or any other preferred web search MCP or file search/locator tool, it will wrap it through the shell and integrate it more natively and intuitively. If left out, it will hide these tools from the model.

Codex CLI compatibility: - Skip non-function tool types (web_search, code_interpreter) - Merge developer/system messages into position 0 for Qwen templates - Strip Responses-only request keys (store, include, prompt_cache_key) - Restore refusal content type handling Responses API compliance (ideas from ggml-org#19720 by riskywindow, adapted): - Add 24 missing Response object fields per OpenAI spec - Fix function_call id/call_id field mapping - Add sequence_number, output_index, content_index to ALL streaming events - Full response object in response.created/in_progress events - Accept input_text type and EasyInputMessage for multi-turn input - output_text convenience field, output_tokens_details 14 pytest tests, E2E tested with async OpenAI SDK and Codex CLI. Refs: ggml-org#19138, ggml-org#19720, ggml-org#21174

Cherry-pick of ggml-org#20819 by European-tech. Persist context checkpoints in a companion .checkpoints file alongside slot saves. Without this, restoring a slot for hybrid/recurrent models triggers full prompt reprocessing (23s for 26K tokens on Qwen3.5-27B). With checkpoint persistence, restore takes 75ms. Binary format with magic 0x4C4C4350 ("LLCP"), versioned, backward compatible (old saves without companion file load normally).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d72b0819db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-03T09:03:12Z

+                    {"sequence_number", seq_num++},
+                    {"output_index",    output_idx++},
                    {"item", json {
+                        {"id",        oai_resp_fc_item_id},


Emit a fresh function-call item id for each added tool call

server_task_result_cmpl_partial::update() only assigns state.oai_resp_fc_item_id after snapshotting state into the chunk fields, so to_json_oaicompat_resp() can emit response.output_item.added with {"id": oai_resp_fc_item_id} from the previous value (often empty on the first streamed tool call). This makes streamed response.function_call_arguments.delta.item_id/final output_item.done.item.id inconsistent with the announced item, which breaks clients that stitch function-call argument deltas by item_id.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-03T09:03:12Z

+    if (checkpoints.empty()) {
+        return true;
+    }


Remove stale checkpoint sidecar when no checkpoints exist

When checkpoints is empty, slot_checkpoints_save() returns without touching <filepath>.checkpoints, so reusing the same save filename can leave an old sidecar file behind. A later restore will then load stale checkpoint metadata for a different KV snapshot, which can trigger invalid recurrent-state restore attempts or unnecessary full prompt reprocessing.

Useful? React with 👍 / 👎.

krystophny and others added 2 commits March 30, 2026 18:35

chatgpt-codex-connector Bot reviewed May 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add near-complete Codex VSCode Support, full OAI Responses bridge#3

Add near-complete Codex VSCode Support, full OAI Responses bridge#3
michaelw9999 wants to merge 2 commits into
michaelw9999:full-openai-responsesfrom
krystophny:master

michaelw9999 commented May 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

michaelw9999 commented May 3, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants