Skip to content

feat(codex): native app-server plan and chat controls#3

Draft
SYU8384 wants to merge 44 commits into
mainfrom
feat/codex-plan-controls-chat
Draft

feat(codex): native app-server plan and chat controls#3
SYU8384 wants to merge 44 commits into
mainfrom
feat/codex-plan-controls-chat

Conversation

@SYU8384

@SYU8384 SYU8384 commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Summary

Consolidates the Codex plan-controls work into a single PR covering the native app-server conversation binding plus all chat-side plan and user-input controls. A follow-up PR (#4) adds interactive approval routing.

Companion PR: #4 — bound-conversation approval routing through the OpenClaw gateway.

Features

Native conversation binding

  • Codex app-server thread create/resume bound to an OpenClaw conversation
  • Bound turn loop with collector-based reply assembly
  • Tool call stubbing for dynamic OpenClaw tools
  • Config/runtime wiring (approvalPolicy, sandbox, serviceTier, model-backed reviewer)

Plan mode and reasoning effort

  • /codex plan [on|off|status] toggles collaborationMode between default and plan for bound threads
  • /codex think [plan|execute] [default|minimal|low|medium|high|xhigh|status] sets per-mode reasoning defaults
  • Local plan/think forms bypass the native execution guard

Live progress and progress delivery

  • /codex live [on|off|status] causes bound turns to emit standalone progress/commentary messages
  • Inter-tool commentary delivered durably as standalone channel messages
  • Plan-update payloads wired to deliver through progress + request_user_input channels

Chat plan approval controls

  • Detects <proposed_plan> in assistant text
  • Renders "Approve and execute / Approve and execute with clean context / Stay in plan mode" buttons
  • Plan-approval follow-up turns route through the bound turn loop

User-input chat controls

  • Bridges Codex item/userInput (or request_user_input) prompts to chat buttons and freeform replies
  • Sequential question rendering (one question at a time)
  • Typed freeform replies resolve against pending controls (numeric/label/prefix matches)
  • before_dispatch hook routes typed input replies before normal steering

Changes

  • extensions/codex/src/conversation-control.ts — plan/think/live command parsing and binding state
  • extensions/codex/src/conversation-binding.ts — bound turn loop, progress delivery, chat controls wiring
  • extensions/codex/src/conversation-chat-controls.ts — plan approval and user-input control state
  • extensions/codex/src/conversation-progress-reply.ts — standalone progress message delivery
  • extensions/codex/src/app-server/reasoning-defaults.ts — per-mode reasoning defaults
  • extensions/codex/src/app-server/session-binding.ts — collaboration mode and reasoning persistence
  • extensions/codex/src/app-server/user-input-bridge.ts — request_user_input to chat bridge
  • extensions/codex/src/app-server/user-input-shared.ts — shared user-input helpers
  • extensions/codex/src/command-handlers.ts — local plan/think command handlers
  • extensions/codex/index.ts — bound inbound claim, freeform dispatch hook, progress reply channel
  • src/auto-reply/reply/dispatch-from-config.ts — verbose progress lane, before_dispatch freeform hook
  • ui/src/ui/views/overview.ts — removed Codex demo widget usage (widgets preserved)
  • Plus tests for: conversation-control, conversation-binding, user-input-bridge, sequential question rendering, freeform dispatch, progress reply delivery

Verification

  • pnpm tsgo:extensions passes
  • pnpm build passes
  • Focused vitest passes for extensions/codex/src/conversation-binding.test.ts, extensions/codex/src/app-server/user-input-bridge.test.ts, and related modules
  • 16 pre-existing vitest failures in extensions/codex/index.test.ts, extensions/codex/src/app-server/config.test.ts, extensions/codex/src/app-server/thread-lifecycle.test.ts are documented as unrelated to this change (same on the current branch)

Hex and others added 30 commits June 13, 2026 03:35
(cherry picked from commit 222876a)
(cherry picked from commit 99357d7)
(cherry picked from commit 10290c8)
(cherry picked from commit 29df36c)
(cherry picked from commit dee57b3)
(cherry picked from commit 16bd0fe)
(cherry picked from commit 95db9a9)
(cherry picked from commit b2de8b8)
(cherry picked from commit 48f0687)
(cherry picked from commit 1a31656)
(cherry picked from commit c4d2c84)
(cherry picked from commit 2eac637)
(cherry picked from commit 553cad5)
(cherry picked from commit b9a825e)
(cherry picked from commit 9526523)
(cherry picked from commit 68a9071)
(cherry picked from commit 9bc351f)
(cherry picked from commit f70526d)
(cherry picked from commit 0acdd62)
(cherry picked from commit 21eb1e5)
(cherry picked from commit 96fdb29)
(cherry picked from commit af275ab)
(cherry picked from commit eb92be3)
(cherry picked from commit c825877)
(cherry picked from commit 61ca0cc)
(cherry picked from commit 0fb20af)
(cherry picked from commit 58c0903)
(cherry picked from commit 1c61902)
(cherry picked from commit 0fcd785)
(cherry picked from commit cfeb1a4)
(cherry picked from commit fa20b75)
(cherry picked from commit a4fc8fa)
(cherry picked from commit 0c32ad6)
(cherry picked from commit a06436f)
(cherry picked from commit 523e33f)
(cherry picked from commit 324ac59)
(cherry picked from commit fd49b27)
(cherry picked from commit 0214465)
(cherry picked from commit 981b58c)
(cherry picked from commit 1bced46)
(cherry picked from commit 18f8398)
(cherry picked from commit 9f54701)
(cherry picked from commit 1b3103a)
(cherry picked from commit 887d0f0)
(cherry picked from commit a5825fe)
When Codex's request_user_input tool sends 2-3 questions in one call, this
renders Q1 first and posts Q2 as a brand-new reply after Q1 is answered
(sequential path). The wire protocol still treats this as one request to
one merged response; only the channel rendering policy changes.

The legacy one-shot path (createCodexUserInputPrompt) is unchanged for N==1
and for channels without presentation support. The discord/telegram/slack
adapters that consume the result see consumed=false for partial clicks on
the legacy combined card so the user can keep clicking buttons for
remaining questions, and consumed=true on the sequential path so the
just-answered row gets disabled before the next question is posted.

Covers:
- per-question render (no header prefix when only one question on screen)
- partial-click advance + emit next prompt closure
- typed 'Other:' text advances similarly
- out-of-order clicks rejected as 'awaiting Q[n] header'
- partial answers dropped on cancel/abort per PR decision
- legacy multi-question (no sequential emit) freeform merge path preserved

Codex upstream's request_user_input tool spec allows 1-3 questions per call
(codex-rs/core/src/tools/handlers/request_user_input_spec.rs:55-67). This
change is purely channel-side rendering.

(cherry picked from commit d9d42613f3a59cc2778ae44031517512176b9c74)
(cherry picked from commit 5dd347b)
(cherry picked from commit 5e637a88841bb160b7d3a88a89e4a6aebdb2bbe7)
(cherry picked from commit c4b232d)
…ion freeform

- Telegram/Discord/Slack interactive handlers now skip the reply call
  when result.message is empty, so the sequential partial-click (which
  intentionally sends no acknowledgment because the next question is
  posted as a new reply) does not send an empty message to the chat API.
  disableComponents / clearButtons / editMessage still run, so the used
  row is correctly locked before Q2 arrives.

- The freeform path now matches sequential pending entries by the
  currently-shown question's isOther flag instead of falling through to
  the legacy all-questions merge rule. Upstream Codex normalizes
  request_user_input questions to isOther=true and the prompt tells users
  they may reply with their own answer, so a user typing a custom answer
  for Q1 of a 2-3 question sequential prompt now correctly advances to
  Q2 instead of being rejected as 'matched: false'.

(cherry picked from commit 8b043d18abffb4ec4be3ba158b92e8ec8ab0615a)
(cherry picked from commit 2f4c9d1)
…mo overview widgets

Two follow-ups to the codex plan controls PR (openclaw#88446), both surfaced
by autoreview on the rebased feat/codex-plan-controls branch:

(1) buildCommandInboundEvent / buildCommandInboundContext now thread
    the original conversation target through the synthetic inbound
    event/ctx so the follow-up turn's progress and any Codex
    request_user_input prompts it raises are deliverable to the chat
    that approved the plan. Previously the synthetic event dropped
    metadata.to / conversationId / parentConversationId and the
    synthetic ctx dropped pluginBinding, so the progress sender
    silently returned without sending and any request_user_input
    prompt would have waited for the 10-minute timeout before
    timing out. The two call sites in approveCurrentContextPlan and
    approveConversationPlanWithCleanContext read the current binding
    via ctx.getCurrentConversationBinding() and pass the routing
    fields from ctx (to, threadParentId, sessionId). Adds a unit
    test asserting the synthetic event + ctx carry the three fields.

(2) ui/src/ui/views/overview.ts no longer renders the
    openclaw-demo-button and openclaw-demo-status-widget placeholder
    elements. These were introduced by fix: preserve codex telegram
    plan context on the production overview route and shipped to
    users as product UI noise unrelated to the Codex plan controls
    change. The two element imports + the two render usages are
    removed, and ui/src/ui/views/overview-render.test.ts (which only
    asserted the demo widgets' presence) is deleted. The standalone
    custom element source files in ui/src/ui/components/ are kept for
    parity with the rest of the custom-element set but are no longer
    referenced from the production overview route.

(cherry picked from commit 34b681e)
…tParams

The previous patch narrowed buildTurnStartParams.collaborationMode to
the Codex wire object (CodexTurnCollaborationMode), but local callers
and tests still pass stored string modes (e.g. 'plan', 'default').
Build failed TS at extensions/codex/src/app-server/run-attempt.test.ts
when the test passes collaborationMode: 'plan' to buildTurnStartParams.

Match the same fix applied to buildTurnCollaborationMode earlier in
this rebase: accept CodexTurnCollaborationMode | string at the
outer option, and let buildTurnCollaborationMode normalize the string
into the wire object.

(cherry picked from commit 5eea14e)
…ad resume

The thread-resume writeCodexAppServerBinding call only carried
forward the original binding's collaborationMode and legacy
reasoningEffort. The new chat plan controls added two more
persisted per-binding preferences (reasoningEffortDefaults for
/codex think and liveProgress for /codex live) that are not part
of the resume's overridden runtime fingerprints, so a later
app-server resume on the same session silently dropped them and
the user had to re-set /codex think and /codex live after every
lifecycle operation.

Include both fields in the resumed binding write so per-binding
preferences survive across resume / reconnect operations.

(cherry picked from commit 17d80cd)
The progress-reply path calls adapter.sendPayload directly without
invoking the adapter's afterDeliverPayload hook. Discord's
afterDeliverPayload registers the delivered message id with the
Codex user-input control tracker, so a typed/freeform answer can
resolve the Codex request but the original Discord buttons stay
live and stale until someone clicks them.

Invoke adapter.afterDeliverPayload?.({ cfg, target, payload,
results }) immediately after sendPayload returns, matching the
shape used by core's outbound deliver pipeline. The Discord
afterDeliverPayload then registers the delivered message id so a
later freeform answer disables the corresponding control token.

(cherry picked from commit 3b84c03)
Wrap the new direct adapter.afterDeliverPayload call in try/catch so
a hook crash after a successful platform send does not fail the whole
sendProgressReply. This matches the shared outbound pipeline's
maybeNotifyAfterDeliveredPayload helper which isolates and logs
hook failures, preserving successful delivery semantics for
downstream Codex user-input prompts.

Without this guard, a Discord afterDeliverPayload failure would
cancel the pending Codex user-input token even though the prompt
had already been delivered, leaving a visible prompt whose buttons
and freeform answers no longer resolve the Codex request.

(cherry picked from commit 0150b7b)
SYU8384 and others added 14 commits June 13, 2026 03:36
buildBoundConversationCollaborationMode emitted settings.model: null
whenever the bound CodexAppServerThreadBinding did not have a stored
model field. Earlier versions of the binding type treated model as
optional, so legacy session files (and any binding created without
a model) would now fail the next turn/start with an invalid
collaborationMode payload because the Codex app-server contract
requires Settings.model to be a string.

Return undefined from the helper when binding.model is missing so
the turn request omits the collaboration mode object entirely. The
user can re-bind or set /codex model to pick a model; the existing
turn semantics (collaborationMode + reasoningEffort) continue to
work for bindings that have a model.

Adds a regression test that asserts the turn/start request has no
collaborationMode field when the binding omits model.

(cherry picked from commit 41bd8fd)
…l prompts

answerCodexUserInputFreeform rejected every pending request whose
currently-shown question lacked isOther, even when the user's
typed reply was a numeric prefix (e.g. '1') or the exact option
label. Channels that cannot render or keep buttons (plain text
relays, accessibility contexts) relied on this fallback to
resolve the active request_user_input; otherwise the message was
routed to a new bound turn while the original Codex turn waited
until the 10-minute timeout.

Add resolveFreeformOptionAnswer which normalizes the typed reply
against the rendered options: numeric prefix -> option label,
case-insensitive exact match -> option label, otherwise the raw
text. The sequential filter accepts the entry when normalization
matches or the question is isOther; replies that do not normalize
to any option stay rejected so stray chat messages do not consume
the request. The sequential branch records the normalized label
on the pending entry so the resolved merge uses the canonical
option label.

Updates the existing test that was encoding the regression to
exercise the correct fallback (numeric prefix on a labeled
question) and the rejection path (unrelated reply on a labeled
question stays matched: false). Adds a new test covering both
numeric and label forms for sequential prompts.

(cherry picked from commit c2ff571)
The previous commit returned a string from resolveFreeformOptionAnswer
and used string equality as a sentinel for 'no match'. That missed
exact same-case label replies because resolveFreeformOptionAnswer
returns the same label string when the user types the label
verbatim, so the filter treated it as 'no option matched' and
rejected a perfectly valid fallback.

Return { matched, answer } from the normalizer so callers can
distinguish a real match from a raw-typed reply. Update both the
sequential filter and the sequential branch to use the new shape.

Also fix a related gate: the legacy 'some question is isOther'
prefilter still ran before the sequential numeric/label
normalization, so all-option sequential prompts (where isOther
is false for every question) were rejected before the fallback
could try to resolve the typed reply. Skip the isOther prefilter
for sequential pending entries; the option-match check below
already validates the reply.

Add a regression test covering all-option sequential prompts
(where no question has isOther) and the case-insensitive exact
label fallback.

(cherry picked from commit ca50543)
…+ show active think defaults in /codex binding

Two P2/P3 follow-ups flagged by autoreview on the rebased
feat/codex-plan-controls branch:

(1) /codex plan and /codex think were added to
    CODEX_NATIVE_EXECUTION_SUBCOMMANDS, which made every
    '/codex plan ...' and '/codex think ...' invocation pass through
    resolveCodexNativeExecutionBlock. In sessions where native
    Codex execution is sandbox-blocked, users could not run local
    preference forms like '/codex plan off', '/codex plan status',
    '/codex plan stay <token>', or '/codex think status' even
    though those forms only read or update the stored binding.
    Existing controls (model, fast, permissions) explicitly return
    before the native execution gate for status/invalid/local
    forms. Match that pattern: plan [on|off|status|empty|stay
    <token>] is local; only plan [approve|approve-clean] <token>
    triggers a native Codex turn and must stay behind the guard.
    think is always local (it is a preference write against the
    bound binding).

(2) /codex binding still formatted threadBinding.reasoningEffort
    directly, but the new /codex think command stores into
    reasoningEffortDefaults and clears the legacy reasoningEffort
    field. After running /codex think xhigh, the binding status
    reported 'Think: default', making the new control appear not
    to work. Switch the binding status path to the same
    resolveCodexAppServerConversationReasoningEffort helper the
    turn start path uses, so /codex binding and /codex think agree
    on the active effort including the new per-mode defaults.

Adds two regression tests: one extends the existing 'local Codex
binding status forms in sandboxed sessions' test to cover
'/codex plan status' and '/codex think status'; one asserts the
binding status reports the active think effort resolved from the
new reasoningEffortDefaults field.

(cherry picked from commit 49d73e9)
…ng configured think defaults

The previous commit passed the full Codex plugin config to
readCodexAppServerConversationReasoningDefaults, but that helper
only reads a flat { execute, plan } object. Configured defaults
live at plugins.entries.codex.config.appServer.conversationReasoningDefaults
(not at the top level of the plugin config), so the configured
default was silently dropped on /codex binding status while the
turn start path used the unwrapped value via runtime config.

Read the configured defaults via readCodexPluginConfig(pluginConfig)
.appServer?.conversationReasoningDefaults before passing to the
helper, matching how /codex think status resolves them.

Adds a regression test that passes a pluginConfig with
appServer.conversationReasoningDefaults and asserts the binding
status reflects it.

(cherry picked from commit fe9d2b6)
…request_user_input prompts

Plan-decision callbacks (Approve / Approve with clean context) ran the
follow-up turn via runCodexBoundConversationPrompt, but no
sendProgressReply was wired through. runBoundTurn returns
emptyUserInputResponse() when sendProgressReply is undefined, so any
Codex request_user_input prompt in the follow-up turn timed out after
10 minutes and the user never saw the question.

CodexCommandDeps.buildPlanApprovalProgressReply now takes the
channel of the originating callback so the progress sender is
correctly targeted for telegram, discord, and slack. The factory is
threaded through all three handleCodexPlanDecisionCallbackLazy
call sites in extensions/codex/index.ts and the slash command path.

(cherry picked from commit 3d77573)
…presentation

- Use the originating event.channel for the inbound_claim progress
  sender so Discord/Slack bound conversations no longer get their
  request_user_input prompts routed through the Telegram adapter.
- Render the portable Codex presentation via
  normalizeMessagePresentation + adaptMessagePresentationForChannel +
  channel renderPresentation before sendPayload, so plan-approval
  follow-up prompts reach the user as native buttons/components
  instead of text-only payloads.

(cherry picked from commit c78067a)
…nversations

When the user types a freeform reply to a Codex request_user_input
prompt on a bound conversation and answerCodexUserInputFreeform
returns matched: false, the inbound_claim falls through to
`return { handled: true }` (the non-command-authorized path) and the
user sees no response. The most likely cause is a scope mismatch
between the pending user input (queued by sendProgressReply on the
synthetic inbound event from the plan-approval follow-up) and the
inbound ctx (the user's typed text dispatch context) on channel /
senderId / sessionKey / messageThreadId.

Add a single embeddedAgentLog.warn after the freeform check that
records:
- the typed prompt preview
- inputResult.matched and the resulting message
- the inbound event's channel / senderId / accountId / sessionKey /
  messageThreadId / commandAuthorized
- the binding's kind and sessionFile

This is a diagnostic-only change: no behavior is altered. After the
user re-tests on Discord, the log line will identify which field
mismatches and we can land the actual fix in a follow-up commit.

(cherry picked from commit cc97d6a)
…orm does not match

Previously, a non-command-authorized typed message on a bound
codex-app-server conversation was silently swallowed by
`return { handled: true }` after the freeform matcher check. The
matcher could return matched: false for a typed reply that did not
exactly match a pending user_input option label, scope-mismatched
on sessionKey / messageThreadId, or had no pending at all. The
result: the user typed text into a bound Codex chat, saw no
response, and the bound turn never started.

A bound chat session should always reach Codex as a fresh turn
prompt so the user sees a response, even if the typed text is plain
prose. Slash commands are still protected upstream by
answerCodexUserInputFreeform's "/" check (line 322 of
conversation-chat-controls.ts), and the codex plugin's own /codex
command router handles explicit /codex <verb> commands before this
inbound_claim hook is reached.

Update the two existing tests that were asserting the old
silent-drop behavior to assert the new fall-through-to-turn-start
behavior, including a binding sidecar file so the new turn can
locate the thread.

A diagnostic log line is kept for the case where the freeform
matcher returns matched: false on an app-server bound conversation.
This logs only scope fields (channel / senderId / accountId /
sessionKey / messageThreadId / commandAuthorized and the binding's
kind / sessionFile) — never prompt content — so future debugging
does not require a code change and authorized prompts (which can
contain secrets) are not captured.

(cherry picked from commit ac62d6d)
Previously, a typed freeform reply that was a prefix of the option
label but not the exact label was rejected by
resolveFreeformOptionAnswer. For example, a user that types
"CLI Cleanup" against a rendered option labeled
"CLI Cleanup    (Recommended)    - Keeps the fake plan small and
engineering-shaped." would not match (case-insensitive exact
match only), so the matcher returned matched: false and the user's
typed text was treated as an unmatched freeform.

Add a single-option prefix match: if exactly one option's label
starts with the typed text (case-insensitive), resolve to that
option. If two or more options share a common prefix, fall through
to the caller's freeform fallback (or the new inbound_claim
"couldn't match your reply" guard) so ambiguity is handled
elsewhere.

This unblocks the common pattern of typing a shortened option
label without copying the (Recommended) suffix or description that
the chat UI renders alongside the button.

(cherry picked from commit 0ce4984)
…m the chat UI

The buildCurrentContextPlanApprovalPrompt and
buildCleanContextPlanApprovalPrompt builders used a generic
"The user approved the plan below" framing that did not name
Codex or attribute the approval to the chat-UI button click.
That left Codex without a clear mental model of how the approval
reached it, and led to a follow-up reply where Codex mis-attributed
the button to the surrounding OpenClaw shell rather than accepting
it as a real plan approval.

Update both builders to use first-person framing: "I (Codex)
just received an ... button click from the OpenClaw chat UI.
That button is OpenClaw routing the user's plan approval back
into this Codex thread — it is not a command the user typed."
The remainder of the prompt (execute the plan, re-read files,
verify) is unchanged.

Add tests that pin the new wording for both the current-context
and clean-context approval flows.

(cherry picked from commit 32272cf)
…atch hook

The 15eda36 commit (resolve typed input replies) removed a closing
paren and trailing comma in dispatch-from-config.ts, breaking the
traceReplyPhase arrow-function call site. Restore the missing paren
to make the function-call expression syntactically valid.
The 15eda36 commit (resolve typed input replies) added a before_dispatch
handler in codex/index.ts that calls answerCodexUserInputFreeform with
channel: event.channel ?? ctx.channelId. The freeform helper requires a
non-null channel string. The current branch has an explicit channel guard;
the cherry-pick lost the guard during conflict resolution. Restore the
guard, and reformat the test assertion block to match.
@SYU8384 SYU8384 closed this Jun 12, 2026
@SYU8384 SYU8384 reopened this Jun 12, 2026
@SYU8384 SYU8384 changed the title feat(codex): channel-native plan and user-input controls (PR 2/3) feat(codex): native app-server plan and chat controls Jun 13, 2026
@SYU8384 SYU8384 added the codex Codex-related label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

codex Codex-related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant