agents(hermes): wire reasoning config for chat slot#669
Conversation
thinmintdev
left a comment
There was a problem hiding this comment.
VERDICT: APPROVE-ON-GREEN — merge-ready once the python matrix passes
Clean, self-contained: a Hermes config-template change (config.yaml.j2) + 5 unit tests. ui + γ-suite already green; python (3.11/3.12) pending — given this is a pure template + test change whose tests yaml.safe_load the rendered output, no risk, so APPROVE on a green python matrix.
- The 3 reasoning keys are correct and exactly match the known Hermes thinking gotcha:
• display.streaming: true + display.show_reasoning: true (replacing show_reasoning: false) — both required together; streaming is the transport prerequisite, show_reasoning the UI gate. Without streaming the TUI hangs on the block.
• model.max_tokens: 8192 — prevents Qwen3 from draining the budget inside and returning empty content (silent TUI). Correctly nested as the last key under model:. - base_url INTACT: the diff does not touch the model.base_url conditional; max_tokens is added after the {%- endif %} so it renders in BOTH the primary-slot branch and the no-slot fallback. test_rendered_config_model_base_url_set asserts base_url present in both branches — matches the bare-provider-custom-needs-base_url gotcha.
- #635 reconciliation is SOUND and documented. Verified the "single delegation block / no per-subagent routing" claim: the template renders exactly one flat delegation: {model, provider, base_url} mapping (no per-subagent-type structure), and delegate_tool.py is upstream Hermes (not in this repo) — it consumes that one block. So "advanced-reasoning subagents → chat-27b" is unsatisfiable as literally written; the correct resolution is reasoning-visible on the top-level chat model + delegation→agent MoE (thinking-off for fast agentic subtasks). This is documented in both the template delegation comment and test_delegation_targets_agent_slot_not_chat.
- delegation→agent (not chat) was already wired via _DELEGATION_SLOT_NAME="agent" in #654/#665; this PR only updates the doc comment agent-hermes→agent, consistent with the rename. The test pins delegation.model to the agent MoE and asserts it is NOT the chat model.
STANDARDS:
- Template renders valid YAML (all 5 tests parse-load it); 5 tests cover show_reasoning/streaming/max_tokens/base_url(both branches)/delegation-separation.
- No regression risk to existing Hermes routing — the only behavioral keys changed are the two display flags (false→true) + the new max_tokens; base_url logic and provider resolution untouched.
- Author reports slots 149 + ruff clean; ui/γ green corroborate the non-python side.
NOTE (non-blocking, author already disclosed): the CT105 live container e2e ("reasoning path verified end-to-end against the container slot", #661 AC2) is deferred — hal0-dev has no GPU. Config correctness is unit-verified and _smoke_chat_roundtrip handles the max_tokens/reasoning fallback. Recommend running the live smoke on CT105 post-merge to close #661 AC2 fully.
MERGE-READY: yes, on green python. The deferred live e2e is the only open item and is a post-merge verification, not a code blocker.
thinmintdev
left a comment
There was a problem hiding this comment.
Opus review — APPROVE on merits (HOLD merge: pre-existing main breakage)
Verdict: the change itself is correct, well-tested, and merge-ready. DO NOT merge yet — the python matrix is red on a pre-existing, unrelated failure on main, not on anything #669 introduced. Fix that first, re-run, then merge.
Spec (#661 + #635) — satisfied, reconciliation is sound
- Verified against the actual upstream
delegate_tool.py(_resolve_delegation_credentials(cfg, ...), src/hermes-agent): it reads exactly one delegation block —model/provider/base_url/api_key/api_mode. There is nosubagent_type/agent_typediscriminator anywhere, so per-subagent-type routing genuinely does not exist upstream. The #635→#661 reconciliation (reasoning-ON lives on the top-levelchatmodel; delegation →agentMoE thinking-off) is a real constraint, not a cop-out, and it's documented in the template delegation comment. _DELEGATION_SLOT_NAME = "agent"confirmed on the PR branch (post-rename #654/#665). Fixture_ROLE_SLOTSis internally consistent:chat→qwen3-coder-next-reap-40b-a3b-q4kxl,agent→hermes-4-14b-q5km; the newtest_delegation_targets_agent_slot_not_chatmeaningfully proves delegation routes toagent, notchat.
Config rendering — correct
- The #635 silent-TUI gotcha is handled correctly and completely:
display.streaming: true(transport prerequisite) +display.show_reasoning: true(UI gate) +model.max_tokens: 8192(prevents Qwen3 draining the budget in<think>→ empty content).model.base_urlrouting intact in both the primary-slot and no-slot fallback branches (guards the bare-provider: custom→OpenRouter trap). Comments accurately explain each. - Minor (non-blocking): the three reasoning flags render unconditionally on every config, not gated on whether the
chatslot actually holds a thinking model. Harmless (no-ops for a non-thinking model; max_tokens just bounds output) and intentional-by-assumption ("chat slot runs Qwen3-class") — flagging so it's a known choice, not an oversight.
Tests / CI — the #665 lesson, checked properly
- All 5 new tests PASSED in the real python 3.11 CI matrix (confirmed in the job log, not just locally).
- Arithmetic proof the red is not ours:
main@ acd27fc = 7 failed / 2870 passed / 5 skipped; this PR = 7 failed / 2875 passed / 5 skipped. Same 7 failures, same 5 skips, +5 passed (exactly the new tests). #669 adds zero new failures. - The 7 failures are all in
tests/api/test_slots_container_state.py(slot 'chat' is not configured/KeyError: 'container_status') — slot-rename fallout already broken onmain, outside #669's two-file scope.python (3.12)shows "cancel" only because fail-fast killed it after 3.11, not an independent failure. - ruff format/check is covered inside the python job (no separate run needed).
Action
- Fix the pre-existing
test_slots_container_state.pybreak onmain(its own PR). - Re-run #669 → python goes green.
- Merge #669 (no rebase concerns; ui + γ-suite already green; surface is clean).
🤖 Generated with Claude Code
Wire `display.show_reasoning: true`, `display.streaming: true`, and `model.max_tokens: 8192` into the Hermes config template so the chat slot's Qwen3-class thinking model is visible in the TUI. Three Hermes thinking-model gotchas applied (#635 memory note): - streaming: true is the transport prerequisite; without it the TUI hangs silently on the <think> block. - show_reasoning: true is the UI gate; without it the reasoning output is suppressed even when streaming is enabled. - max_tokens: 8192 caps the budget so Qwen3 cannot exhaust it entirely in <think> and return an empty content field (silent TUI). Delegation → `agent` slot (ace-saber MoE, thinking-off) is unchanged; _DELEGATION_SLOT_NAME was already renamed "agent" in #654/#665. Reconciliation (#635 vs #661): #635 asked for "advanced-reasoning subagents → chat-27b" but Hermes has a single delegation config — no per-subagent-type routing exists upstream. Reasoning visibility therefore lives on the top-level chat conversation (show_reasoning + streaming on the main model); the agent MoE handles delegation and is intentionally thinking-off for fast agentic subtasks. A note is added to the delegation comment in the template. Also updated the template docstring to fix the stale `agent-hermes` reference (post-rename: slot is now `agent`). Live container e2e deferred: hal0-dev has no GPU; CT105 runtime verification is out of scope for this PR (noted in acceptance criteria). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
85b1cf6 to
f0bab19
Compare
Summary
display.show_reasoning: true+display.streaming: trueto the Hermes config template so Qwen3-class thinking models on thechatslot are visible in the TUI (agents(hermes): point Hermes at chat-27b with reasoning ON for advanced-reasoning subagents #635 gotcha: both flags required together — streaming is the transport prerequisite, show_reasoning is the UI gate)model.max_tokens: 8192to prevent silent TUI: without it, Qwen3 can exhaust the full token budget in<think>and return an empty content fieldagent-hermes→agent(post-rename Slot rename: agent-hermes→agent, primary→chat + hidden back-compat aliases (#633) #654/slots: rename primary→chat, add hidden back-compat aliases #665) and documents the agents(hermes): point Hermes at chat-27b with reasoning ON for advanced-reasoning subagents #635 reconciliationReconciliation: #635 vs #661
#635 asked for "advanced-reasoning subagents → chat-27b" but Hermes has a single delegation config — no per-subagent-type routing exists upstream (
delegate_tool.pyreads onedelegation.{model,provider,base_url}block). Reasoning visibility therefore lives on the top-level chat conversation (show_reasoning + streamingon the main model); theagentMoE handles delegation and is intentionally thinking-off for fast agentic subtasks (tool use, decomposition). This is noted in the template delegation comment.Delegation →
agentslot was already wired via_DELEGATION_SLOT_NAME = "agent"in #654/#665.Live container e2e
hal0-dev has no GPU. CT105 runtime e2e verification ("reasoning path verified end-to-end against the container slot") is deferred — config correctness is verified by unit tests here; the smoke test in the provision pipeline (
_smoke_chat_roundtrip) already handles the max_tokens/reasoning-content fallback.Test plan
test_rendered_config_has_show_reasoning_true— display.show_reasoning is truetest_rendered_config_has_streaming_true— display.streaming is truetest_rendered_config_has_model_max_tokens— model.max_tokens set and positivetest_rendered_config_model_base_url_set— model.base_url present in all branchestest_delegation_targets_agent_slot_not_chat— delegation→agent MoE, not chat slotruff check src tests+ruff format --check src tests— cleanCloses #661, closes #635
Parent: #652
🤖 Generated with Claude Code