Skip to content

agents(hermes): wire reasoning config for chat slot#669

Merged
thinmintdev merged 1 commit into
mainfrom
issue-661-hermes
Jun 8, 2026
Merged

agents(hermes): wire reasoning config for chat slot#669
thinmintdev merged 1 commit into
mainfrom
issue-661-hermes

Conversation

@thinmintdev

Copy link
Copy Markdown
Contributor

Summary

Reconciliation: #635 vs #661

#635 asked for "advanced-reasoning subagents → chat-27b" but Hermes has a single delegation config — no per-subagent-type routing exists upstream (delegate_tool.py reads one delegation.{model,provider,base_url} block). Reasoning visibility therefore lives on the top-level chat conversation (show_reasoning + streaming on the main model); the agent MoE handles delegation and is intentionally thinking-off for fast agentic subtasks (tool use, decomposition). This is noted in the template delegation comment.

Delegation → agent slot was already wired via _DELEGATION_SLOT_NAME = "agent" in #654/#665.

Live container e2e

hal0-dev has no GPU. CT105 runtime e2e verification ("reasoning path verified end-to-end against the container slot") is deferred — config correctness is verified by unit tests here; the smoke test in the provision pipeline (_smoke_chat_roundtrip) already handles the max_tokens/reasoning-content fallback.

Test plan

  • test_rendered_config_has_show_reasoning_true — display.show_reasoning is true
  • test_rendered_config_has_streaming_true — display.streaming is true
  • test_rendered_config_has_model_max_tokens — model.max_tokens set and positive
  • test_rendered_config_model_base_url_set — model.base_url present in all branches
  • test_delegation_targets_agent_slot_not_chat — delegation→agent MoE, not chat slot
  • All 10 delegation/auxiliary/config_write tests pass (18 total selected)
  • Slots tests: 149 passed
  • ruff check src tests + ruff format --check src tests — clean

Closes #661, closes #635

Parent: #652

🤖 Generated with Claude Code

@thinmintdev thinmintdev left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VERDICT: APPROVE-ON-GREEN — merge-ready once the python matrix passes

Clean, self-contained: a Hermes config-template change (config.yaml.j2) + 5 unit tests. ui + γ-suite already green; python (3.11/3.12) pending — given this is a pure template + test change whose tests yaml.safe_load the rendered output, no risk, so APPROVE on a green python matrix.

SPEC (vs #661/#635): ACs met.

  • The 3 reasoning keys are correct and exactly match the known Hermes thinking gotcha:
    • display.streaming: true + display.show_reasoning: true (replacing show_reasoning: false) — both required together; streaming is the transport prerequisite, show_reasoning the UI gate. Without streaming the TUI hangs on the block.
    • model.max_tokens: 8192 — prevents Qwen3 from draining the budget inside and returning empty content (silent TUI). Correctly nested as the last key under model:.
  • base_url INTACT: the diff does not touch the model.base_url conditional; max_tokens is added after the {%- endif %} so it renders in BOTH the primary-slot branch and the no-slot fallback. test_rendered_config_model_base_url_set asserts base_url present in both branches — matches the bare-provider-custom-needs-base_url gotcha.
  • #635 reconciliation is SOUND and documented. Verified the "single delegation block / no per-subagent routing" claim: the template renders exactly one flat delegation: {model, provider, base_url} mapping (no per-subagent-type structure), and delegate_tool.py is upstream Hermes (not in this repo) — it consumes that one block. So "advanced-reasoning subagents → chat-27b" is unsatisfiable as literally written; the correct resolution is reasoning-visible on the top-level chat model + delegation→agent MoE (thinking-off for fast agentic subtasks). This is documented in both the template delegation comment and test_delegation_targets_agent_slot_not_chat.
  • delegation→agent (not chat) was already wired via _DELEGATION_SLOT_NAME="agent" in #654/#665; this PR only updates the doc comment agent-hermes→agent, consistent with the rename. The test pins delegation.model to the agent MoE and asserts it is NOT the chat model.

STANDARDS:

  • Template renders valid YAML (all 5 tests parse-load it); 5 tests cover show_reasoning/streaming/max_tokens/base_url(both branches)/delegation-separation.
  • No regression risk to existing Hermes routing — the only behavioral keys changed are the two display flags (false→true) + the new max_tokens; base_url logic and provider resolution untouched.
  • Author reports slots 149 + ruff clean; ui/γ green corroborate the non-python side.

NOTE (non-blocking, author already disclosed): the CT105 live container e2e ("reasoning path verified end-to-end against the container slot", #661 AC2) is deferred — hal0-dev has no GPU. Config correctness is unit-verified and _smoke_chat_roundtrip handles the max_tokens/reasoning fallback. Recommend running the live smoke on CT105 post-merge to close #661 AC2 fully.

MERGE-READY: yes, on green python. The deferred live e2e is the only open item and is a post-merge verification, not a code blocker.

@thinmintdev thinmintdev left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opus review — APPROVE on merits (HOLD merge: pre-existing main breakage)

Verdict: the change itself is correct, well-tested, and merge-ready. DO NOT merge yet — the python matrix is red on a pre-existing, unrelated failure on main, not on anything #669 introduced. Fix that first, re-run, then merge.

Spec (#661 + #635) — satisfied, reconciliation is sound

  • Verified against the actual upstream delegate_tool.py (_resolve_delegation_credentials(cfg, ...), src/hermes-agent): it reads exactly one delegation block — model/provider/base_url/api_key/api_mode. There is no subagent_type/agent_type discriminator anywhere, so per-subagent-type routing genuinely does not exist upstream. The #635#661 reconciliation (reasoning-ON lives on the top-level chat model; delegation → agent MoE thinking-off) is a real constraint, not a cop-out, and it's documented in the template delegation comment.
  • _DELEGATION_SLOT_NAME = "agent" confirmed on the PR branch (post-rename #654/#665). Fixture _ROLE_SLOTS is internally consistent: chatqwen3-coder-next-reap-40b-a3b-q4kxl, agenthermes-4-14b-q5km; the new test_delegation_targets_agent_slot_not_chat meaningfully proves delegation routes to agent, not chat.

Config rendering — correct

  • The #635 silent-TUI gotcha is handled correctly and completely: display.streaming: true (transport prerequisite) + display.show_reasoning: true (UI gate) + model.max_tokens: 8192 (prevents Qwen3 draining the budget in <think> → empty content). model.base_url routing intact in both the primary-slot and no-slot fallback branches (guards the bare-provider: custom→OpenRouter trap). Comments accurately explain each.
  • Minor (non-blocking): the three reasoning flags render unconditionally on every config, not gated on whether the chat slot actually holds a thinking model. Harmless (no-ops for a non-thinking model; max_tokens just bounds output) and intentional-by-assumption ("chat slot runs Qwen3-class") — flagging so it's a known choice, not an oversight.

Tests / CI — the #665 lesson, checked properly

  • All 5 new tests PASSED in the real python 3.11 CI matrix (confirmed in the job log, not just locally).
  • Arithmetic proof the red is not ours: main @ acd27fc = 7 failed / 2870 passed / 5 skipped; this PR = 7 failed / 2875 passed / 5 skipped. Same 7 failures, same 5 skips, +5 passed (exactly the new tests). #669 adds zero new failures.
  • The 7 failures are all in tests/api/test_slots_container_state.py (slot 'chat' is not configured / KeyError: 'container_status') — slot-rename fallout already broken on main, outside #669's two-file scope. python (3.12) shows "cancel" only because fail-fast killed it after 3.11, not an independent failure.
  • ruff format/check is covered inside the python job (no separate run needed).

Action

  1. Fix the pre-existing test_slots_container_state.py break on main (its own PR).
  2. Re-run #669 → python goes green.
  3. Merge #669 (no rebase concerns; ui + γ-suite already green; surface is clean).

🤖 Generated with Claude Code

Wire `display.show_reasoning: true`, `display.streaming: true`, and
`model.max_tokens: 8192` into the Hermes config template so the chat
slot's Qwen3-class thinking model is visible in the TUI.

Three Hermes thinking-model gotchas applied (#635 memory note):
- streaming: true is the transport prerequisite; without it the TUI
  hangs silently on the <think> block.
- show_reasoning: true is the UI gate; without it the reasoning output
  is suppressed even when streaming is enabled.
- max_tokens: 8192 caps the budget so Qwen3 cannot exhaust it entirely
  in <think> and return an empty content field (silent TUI).

Delegation → `agent` slot (ace-saber MoE, thinking-off) is unchanged;
_DELEGATION_SLOT_NAME was already renamed "agent" in #654/#665.

Reconciliation (#635 vs #661): #635 asked for "advanced-reasoning
subagents → chat-27b" but Hermes has a single delegation config — no
per-subagent-type routing exists upstream. Reasoning visibility
therefore lives on the top-level chat conversation (show_reasoning +
streaming on the main model); the agent MoE handles delegation and is
intentionally thinking-off for fast agentic subtasks. A note is added
to the delegation comment in the template.

Also updated the template docstring to fix the stale `agent-hermes`
reference (post-rename: slot is now `agent`).

Live container e2e deferred: hal0-dev has no GPU; CT105 runtime
verification is out of scope for this PR (noted in acceptance criteria).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@thinmintdev thinmintdev merged commit a4f4629 into main Jun 8, 2026
4 checks passed
@thinmintdev thinmintdev deleted the issue-661-hermes branch June 8, 2026 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hermes → agent slot reasoning wiring (#635) agents(hermes): point Hermes at chat-27b with reasoning ON for advanced-reasoning subagents

1 participant