agents(hermes): wire reasoning config for chat slot by thinmintdev · Pull Request #669 · Hal0ai/hal0

thinmintdev · 2026-06-08T11:03:43Z

Summary

Adds display.show_reasoning: true + display.streaming: true to the Hermes config template so Qwen3-class thinking models on the chat slot are visible in the TUI (agents(hermes): point Hermes at chat-27b with reasoning ON for advanced-reasoning subagents #635 gotcha: both flags required together — streaming is the transport prerequisite, show_reasoning is the UI gate)
Adds model.max_tokens: 8192 to prevent silent TUI: without it, Qwen3 can exhaust the full token budget in <think> and return an empty content field
Updates the delegation comment in the template: renames agent-hermes → agent (post-rename Slot rename: agent-hermes→agent, primary→chat + hidden back-compat aliases (#633) #654/slots: rename primary→chat, add hidden back-compat aliases #665) and documents the agents(hermes): point Hermes at chat-27b with reasoning ON for advanced-reasoning subagents #635 reconciliation
Adds 5 new unit tests asserting the reasoning wiring is correct and stable

Reconciliation: #635 vs #661

#635 asked for "advanced-reasoning subagents → chat-27b" but Hermes has a single delegation config — no per-subagent-type routing exists upstream (delegate_tool.py reads one delegation.{model,provider,base_url} block). Reasoning visibility therefore lives on the top-level chat conversation (show_reasoning + streaming on the main model); the agent MoE handles delegation and is intentionally thinking-off for fast agentic subtasks (tool use, decomposition). This is noted in the template delegation comment.

Delegation → agent slot was already wired via _DELEGATION_SLOT_NAME = "agent" in #654/#665.

Live container e2e

hal0-dev has no GPU. CT105 runtime e2e verification ("reasoning path verified end-to-end against the container slot") is deferred — config correctness is verified by unit tests here; the smoke test in the provision pipeline (_smoke_chat_roundtrip) already handles the max_tokens/reasoning-content fallback.

Test plan

test_rendered_config_has_show_reasoning_true — display.show_reasoning is true
test_rendered_config_has_streaming_true — display.streaming is true
test_rendered_config_has_model_max_tokens — model.max_tokens set and positive
test_rendered_config_model_base_url_set — model.base_url present in all branches
test_delegation_targets_agent_slot_not_chat — delegation→agent MoE, not chat slot
All 10 delegation/auxiliary/config_write tests pass (18 total selected)
Slots tests: 149 passed
ruff check src tests + ruff format --check src tests — clean

Closes #661, closes #635

Parent: #652

🤖 Generated with Claude Code

thinmintdev

VERDICT: APPROVE-ON-GREEN — merge-ready once the python matrix passes

Clean, self-contained: a Hermes config-template change (config.yaml.j2) + 5 unit tests. ui + γ-suite already green; python (3.11/3.12) pending — given this is a pure template + test change whose tests yaml.safe_load the rendered output, no risk, so APPROVE on a green python matrix.

SPEC (vs #661/#635): ACs met.

The 3 reasoning keys are correct and exactly match the known Hermes thinking gotcha:
• display.streaming: true + display.show_reasoning: true (replacing show_reasoning: false) — both required together; streaming is the transport prerequisite, show_reasoning the UI gate. Without streaming the TUI hangs on the block.
• model.max_tokens: 8192 — prevents Qwen3 from draining the budget inside and returning empty content (silent TUI). Correctly nested as the last key under model:.
base_url INTACT: the diff does not touch the model.base_url conditional; max_tokens is added after the {%- endif %} so it renders in BOTH the primary-slot branch and the no-slot fallback. test_rendered_config_model_base_url_set asserts base_url present in both branches — matches the bare-provider-custom-needs-base_url gotcha.
#635 reconciliation is SOUND and documented. Verified the "single delegation block / no per-subagent routing" claim: the template renders exactly one flat delegation: {model, provider, base_url} mapping (no per-subagent-type structure), and delegate_tool.py is upstream Hermes (not in this repo) — it consumes that one block. So "advanced-reasoning subagents → chat-27b" is unsatisfiable as literally written; the correct resolution is reasoning-visible on the top-level chat model + delegation→agent MoE (thinking-off for fast agentic subtasks). This is documented in both the template delegation comment and test_delegation_targets_agent_slot_not_chat.
delegation→agent (not chat) was already wired via _DELEGATION_SLOT_NAME="agent" in #654/#665; this PR only updates the doc comment agent-hermes→agent, consistent with the rename. The test pins delegation.model to the agent MoE and asserts it is NOT the chat model.

STANDARDS:

Template renders valid YAML (all 5 tests parse-load it); 5 tests cover show_reasoning/streaming/max_tokens/base_url(both branches)/delegation-separation.
No regression risk to existing Hermes routing — the only behavioral keys changed are the two display flags (false→true) + the new max_tokens; base_url logic and provider resolution untouched.
Author reports slots 149 + ruff clean; ui/γ green corroborate the non-python side.

NOTE (non-blocking, author already disclosed): the CT105 live container e2e ("reasoning path verified end-to-end against the container slot", #661 AC2) is deferred — hal0-dev has no GPU. Config correctness is unit-verified and _smoke_chat_roundtrip handles the max_tokens/reasoning fallback. Recommend running the live smoke on CT105 post-merge to close #661 AC2 fully.

MERGE-READY: yes, on green python. The deferred live e2e is the only open item and is a post-merge verification, not a code blocker.

thinmintdev

Opus review — APPROVE on merits (HOLD merge: pre-existing main breakage)

Verdict: the change itself is correct, well-tested, and merge-ready. DO NOT merge yet — the python matrix is red on a pre-existing, unrelated failure on main, not on anything #669 introduced. Fix that first, re-run, then merge.

Spec (#661 + #635) — satisfied, reconciliation is sound

Verified against the actual upstream delegate_tool.py (_resolve_delegation_credentials(cfg, ...), src/hermes-agent): it reads exactly one delegation block — model/provider/base_url/api_key/api_mode. There is no subagent_type/agent_type discriminator anywhere, so per-subagent-type routing genuinely does not exist upstream. The #635→#661 reconciliation (reasoning-ON lives on the top-level chat model; delegation → agent MoE thinking-off) is a real constraint, not a cop-out, and it's documented in the template delegation comment.
_DELEGATION_SLOT_NAME = "agent" confirmed on the PR branch (post-rename #654/#665). Fixture _ROLE_SLOTS is internally consistent: chat→qwen3-coder-next-reap-40b-a3b-q4kxl, agent→hermes-4-14b-q5km; the new test_delegation_targets_agent_slot_not_chat meaningfully proves delegation routes to agent, not chat.

Config rendering — correct

The #635 silent-TUI gotcha is handled correctly and completely: display.streaming: true (transport prerequisite) + display.show_reasoning: true (UI gate) + model.max_tokens: 8192 (prevents Qwen3 draining the budget in <think> → empty content). model.base_url routing intact in both the primary-slot and no-slot fallback branches (guards the bare-provider: custom→OpenRouter trap). Comments accurately explain each.
Minor (non-blocking): the three reasoning flags render unconditionally on every config, not gated on whether the chat slot actually holds a thinking model. Harmless (no-ops for a non-thinking model; max_tokens just bounds output) and intentional-by-assumption ("chat slot runs Qwen3-class") — flagging so it's a known choice, not an oversight.

Tests / CI — the #665 lesson, checked properly

All 5 new tests PASSED in the real python 3.11 CI matrix (confirmed in the job log, not just locally).
Arithmetic proof the red is not ours: main @ acd27fc = 7 failed / 2870 passed / 5 skipped; this PR = 7 failed / 2875 passed / 5 skipped. Same 7 failures, same 5 skips, +5 passed (exactly the new tests). #669 adds zero new failures.
The 7 failures are all in tests/api/test_slots_container_state.py (slot 'chat' is not configured / KeyError: 'container_status') — slot-rename fallout already broken on main, outside #669's two-file scope. python (3.12) shows "cancel" only because fail-fast killed it after 3.11, not an independent failure.
ruff format/check is covered inside the python job (no separate run needed).

Action

Fix the pre-existing test_slots_container_state.py break on main (its own PR).
Re-run #669 → python goes green.
Merge #669 (no rebase concerns; ui + γ-suite already green; surface is clean).

🤖 Generated with Claude Code

Wire `display.show_reasoning: true`, `display.streaming: true`, and `model.max_tokens: 8192` into the Hermes config template so the chat slot's Qwen3-class thinking model is visible in the TUI. Three Hermes thinking-model gotchas applied (#635 memory note): - streaming: true is the transport prerequisite; without it the TUI hangs silently on the <think> block. - show_reasoning: true is the UI gate; without it the reasoning output is suppressed even when streaming is enabled. - max_tokens: 8192 caps the budget so Qwen3 cannot exhaust it entirely in <think> and return an empty content field (silent TUI). Delegation → `agent` slot (ace-saber MoE, thinking-off) is unchanged; _DELEGATION_SLOT_NAME was already renamed "agent" in #654/#665. Reconciliation (#635 vs #661): #635 asked for "advanced-reasoning subagents → chat-27b" but Hermes has a single delegation config — no per-subagent-type routing exists upstream. Reasoning visibility therefore lives on the top-level chat conversation (show_reasoning + streaming on the main model); the agent MoE handles delegation and is intentionally thinking-off for fast agentic subtasks. A note is added to the delegation comment in the template. Also updated the template docstring to fix the stale `agent-hermes` reference (post-rename: slot is now `agent`). Live container e2e deferred: hal0-dev has no GPU; CT105 runtime verification is out of scope for this PR (noted in acceptance criteria). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

thinmintdev commented Jun 8, 2026

View reviewed changes

thinmintdev force-pushed the issue-661-hermes branch from 85b1cf6 to f0bab19 Compare June 8, 2026 11:30

thinmintdev merged commit a4f4629 into main Jun 8, 2026
4 checks passed

thinmintdev deleted the issue-661-hermes branch June 8, 2026 11:42

thinmintdev mentioned this pull request Jun 8, 2026

feat(memory-map): container slot attribution via podman cgroup #672

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

agents(hermes): wire reasoning config for chat slot#669

agents(hermes): wire reasoning config for chat slot#669
thinmintdev merged 1 commit into
mainfrom
issue-661-hermes

thinmintdev commented Jun 8, 2026

Uh oh!

thinmintdev left a comment

Uh oh!

thinmintdev left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thinmintdev commented Jun 8, 2026

Summary

Reconciliation: #635 vs #661

Live container e2e

Test plan

Uh oh!

thinmintdev left a comment

Choose a reason for hiding this comment

Uh oh!

thinmintdev left a comment

Choose a reason for hiding this comment

Opus review — APPROVE on merits (HOLD merge: pre-existing main breakage)

Spec (#661 + #635) — satisfied, reconciliation is sound

Config rendering — correct

Tests / CI — the #665 lesson, checked properly

Action

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant