Skip to content

feat(reasoning): per-session reasoning effort override (closes #2697)#2963

Open
Koraji95-coder wants to merge 2 commits into
nesquena:masterfrom
Koraji95-coder:feat/per-session-reasoning-effort
Open

feat(reasoning): per-session reasoning effort override (closes #2697)#2963
Koraji95-coder wants to merge 2 commits into
nesquena:masterfrom
Koraji95-coder:feat/per-session-reasoning-effort

Conversation

@Koraji95-coder
Copy link
Copy Markdown
Contributor

@Koraji95-coder Koraji95-coder commented May 26, 2026

Thinking Path

The CLI's /reasoning <level> writes to agent.reasoning_effort in the active profile's config.yaml. The WebUI followed the same shape: the chip and the slash command both mutated the profile default. That's the right primitive for "set my baseline" but the wrong one for "this one session needs deeper thinking, the rest don't" — switching profile-wide for a single conversation forces the user to remember to switch back.

Followed the 3-layer split from the issue thread:

  1. Storage — add reasoning_effort to the Session dataclass and to METADATA_FIELDS so it round-trips through the sidecar JSON. None means "inherit profile default" (the pre-PR behaviour, preserved end-to-end).
  2. Resolution — at stream construction in api/streaming.py, prefer session.reasoning_effort over agent.reasoning_effort. The same parse_reasoning_effort() helper parses both, so 'none'/'minimal'/.../'xhigh' route identically and unknown values silently become None (no behavioural drift between the two layers).
  3. Surfaces — chip dropdown gets a two-button scope picker ("Set for this session only" / "Set as profile default") at the top, the /reasoning slash command accepts an optional session qualifier, and the chip carries a .session-override class for the visual indicator (italic label + small leading accent dot). Maintainer's Q2 answered by carrying the override forward on /api/session/duplicate; Q3 by writing to the session sidecar (not a memory cache).

The existing /api/reasoning endpoint is untouched — it still writes config.yaml for CLI parity. The new /api/session/reasoning endpoint writes only the session sidecar and validates effort against api.config.VALID_REASONING_EFFORTS. Null/empty effort clears the override and lets the next stream fall back to the profile default.

What Changed

Backend (Python)

  • api/models.pySession.reasoning_effort: Optional[str] (None = inherit profile default). Added to METADATA_FIELDS for sidecar persistence and to compact() so the UI can sync the chip from session state.
  • api/streaming.py — resolve block now reads s.reasoning_effort first and falls back to _cfg['agent']['reasoning_effort'] when the session has no override. Same parse_reasoning_effort() helper for both branches; same _reasoning_config = None fallback when neither is set.
  • api/routes.py — new POST /api/session/reasoning endpoint, validates effort against the canonical set, accepts null/empty to clear the override. /api/session/duplicate carries reasoning_effort forward so the copy behaves identically.

Frontend (JS/HTML/CSS)

  • static/index.html — dropdown gains a .reasoning-scope-group (two buttons), a divider, and a #reasoningClearOverride row that is display-toggled by JS based on whether the current session has an override.
  • static/ui.js_currentReasoningSessionOverride + _reasoningWriteScope trackers, _applyReasoningChip(eff, sessionOverride) extended to set the .session-override class, _syncReasoningScopeUI() reflects active scope + clear-override row visibility, click handler routes to /api/session/reasoning or /api/reasoning based on scope.
  • static/commands.jscmdReasoning parses an optional session second token and routes the request through the session endpoint when present. Existing /reasoning <effort> and /reasoning show|hide calls are unchanged.
  • static/style.css.reasoning-scope-option (+ .selected), divider, clear-override row, and .composer-reasoning-chip.session-override (italic label + leading accent dot). Indicator is also applied to the mobile-config action so the override is visible from the mobile composer.

Tests — new tests/test_2697_per_session_reasoning_effort.py with 22 tests covering all three layers (storage round-trip, streaming precedence, route validation, duplicate-session carry-forward, slash-command parse, dropdown markup, CSS indicator, clear-override flow).

The existing test_reasoning_chip_btw_fixes.py "dropdowns grouped within 2 KB" check was bumped to 3 KB because the new scope picker + clear-override rows added ~380 bytes to the same dropdown container; the structural invariant the test really enforces (sibling of composer-footer, not nested inside composer-left's overflow-hidden) is unchanged and still asserted by the sibling test above it.

Why It Matters

Two concrete pain points the issue captures:

  • Bumping reasoning depth for one tricky session without polluting the profile default (or forgetting to switch it back).
  • Keeping the CLI surface unchanged — a user who lives in hermes at the terminal still gets the same agent.reasoning_effort they always had.

The override is per-session, so duplicating a session keeps the override (maintainer's Q2) and clearing it via the dropdown's clear-override row returns the session to the profile default with no config.yaml write at all.

Verification

pytest tests/test_2697_per_session_reasoning_effort.py on Windows 11 against Python 3.11. --noconftest is used because the repo's session conftest tries to create a Windows symlink that requires SeCreateSymbolicLinkPrivilege (pre-existing Windows-only issue, also affects the existing test_reasoning_show_hide.py and similar; unrelated to this PR).

collected 22 items

tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningEffortRoundTrip::test_default_is_none PASSED [  4%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningEffortRoundTrip::test_explicit_value_is_normalised PASSED [  9%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningEffortRoundTrip::test_empty_string_is_treated_as_none PASSED [ 13%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningEffortRoundTrip::test_save_and_reload_preserves_override PASSED [ 18%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningEffortRoundTrip::test_clear_override_persists_as_none PASSED [ 22%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningEffortRoundTrip::test_compact_payload_emits_reasoning_effort PASSED [ 27%]
tests/test_2697_per_session_reasoning_effort.py::TestStreamingResolvesSessionOverFirst::test_session_override_branch_exists PASSED [ 31%]
tests/test_2697_per_session_reasoning_effort.py::TestStreamingResolvesSessionOverFirst::test_session_override_takes_precedence_over_profile_default PASSED [ 36%]
tests/test_2697_per_session_reasoning_effort.py::TestStreamingResolvesSessionOverFirst::test_parse_reasoning_effort_handles_session_value PASSED [ 40%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningRoute::test_endpoint_registered PASSED [ 45%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningRoute::test_endpoint_validates_effort PASSED [ 50%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningRoute::test_endpoint_accepts_null_to_clear_override PASSED [ 54%]
tests/test_2697_per_session_reasoning_effort.py::TestSessionReasoningRoute::test_duplicate_session_carries_reasoning_effort_forward PASSED [ 59%]
tests/test_2697_per_session_reasoning_effort.py::TestSlashCommandSessionVariant::test_session_token_parsed PASSED [ 63%]
tests/test_2697_per_session_reasoning_effort.py::TestSlashCommandSessionVariant::test_session_branch_posts_to_session_endpoint PASSED [ 68%]
tests/test_2697_per_session_reasoning_effort.py::TestSlashCommandSessionVariant::test_unknown_scope_token_is_rejected PASSED [ 72%]
tests/test_2697_per_session_reasoning_effort.py::TestSlashCommandSessionVariant::test_command_arg_string_mentions_session PASSED [ 77%]
tests/test_2697_per_session_reasoning_effort.py::TestDropdownScopePicker::test_dropdown_has_two_scope_options PASSED [ 81%]
tests/test_2697_per_session_reasoning_effort.py::TestDropdownScopePicker::test_dropdown_has_clear_override_row PASSED [ 86%]
tests/test_2697_per_session_reasoning_effort.py::TestDropdownScopePicker::test_ui_js_tracks_write_scope_state PASSED [ 90%]
tests/test_2697_per_session_reasoning_effort.py::TestDropdownScopePicker::test_chip_carries_session_override_class PASSED [ 95%]
tests/test_2697_per_session_reasoning_effort.py::TestDropdownScopePicker::test_css_indicator_styles_present PASSED [100%]

======================== 22 passed, 1 warning in 1.51s ========================

Adjacent suites re-run to confirm no regressions:

  • tests/test_reasoning_show_hide.py — 28 passed
  • tests/test_reasoning_chip_btw_fixes.py — 17 passed (1 threshold bumped 2 KB -> 3 KB; same invariant)
  • tests/test_reasoning_chip_js_behaviour.py — 11 passed
  • tests/test_issue1103_reasoning_chip_visibility.py — 4 passed, 3 pre-existing Windows-cp1252 (master too)
  • tests/test_465_session_branching.py, tests/test_session_duplicate.py, tests/test_issue1431_toolsets_chip_responsive.py — identical pass/fail counts on master and this branch (Windows-only env failures, unrelated).

node --check clean on static/commands.js and static/ui.js. ruff check on the four touched Python files: zero new findings (the 77 pre-existing issues in api/streaming.py are master-side).

What is verified vs. what is NOT:

  • VERIFIED on Windows 11 + Python 3.11: storage round-trip, streaming precedence block presence + ordering, route handler shape, slash-command parse, dropdown markup, CSS rules.
  • NOT verified end-to-end against a live broker: I did not boot the WebUI server and click through the dropdown. The static + harness tests pin the invariants the issue's spec calls out, but a real-browser screenshot pass on a Linux/macOS dev machine would be a useful follow-up if you want it.

Risks

  • Storage migration. Pre-existing session sidecars don't have reasoning_effort in their JSON. Session.__init__ accepts the kwarg with a None default and the load path passes **data through, so old sidecars load with reasoning_effort=None (= inherit profile default, i.e. exactly the v0.51.137- behaviour). Once a session has been saved under this PR the field appears in the JSON; rolling back would simply see an unknown kwarg on Session.__init__ (which **kwargs absorbs today, but worth noting).
  • Scope picker UX. The default scope when the dropdown opens is session so a single-click on an effort writes the override (not the profile default). For a user who only ever wants the profile-write path this is one extra click. The slash command keeps the old default (/reasoning high -> profile) and the chip's behaviour is discoverable via the visible scope buttons, so the discoverability cost is low.
  • No new auth surface. The endpoint trusts the session-id lookup; per-session authorisation is the same as the existing /api/session/toolsets and /api/session/draft shapes.
  • Dropdown markup grew ~380 bytes. The grouped-dropdowns test was bumped from <2000 to <3000 bytes for the spread. The structural guarantee (dropdown is a sibling of composer-footer, not nested in composer-left) is still enforced by the test above it.

Model Used

claude-opus-4-7 (1M context) via Claude Code, directed by a human contributor. Final review + adversarial pass + commit message + PR body authored on the contributor's machine; all gh search prs/gh search issues pre-flight dup-checks performed before filing.

AI Usage Disclosure

This PR's code, tests, and PR body were drafted with substantial AI assistance (Claude Opus 4.7, via Claude Code). A human contributor read, ran, and reviewed the diff before commit and is responsible for the contents. Three pre-flight items per the contributor's checklist:

  • gh search prs --repo nesquena/hermes-webui "per-session reasoning effort" --state all — no canonical fix in flight.
  • gh search issues --repo nesquena/hermes-webui "session reasoning_effort"Add per-session reasoning effort support in WebUI #2697 is the live issue this PR closes.

…na#2697)

Three layers:

1. Session storage: new `reasoning_effort` field on Session, persisted in
   the session sidecar via METADATA_FIELDS. None means inherit profile
   default (agent.reasoning_effort in config.yaml) — the pre-PR behaviour.

2. Resolve-at-stream-start precedence in api/streaming.py:
   `session.reasoning_effort or config.agent.reasoning_effort`. When a
   session has its own value the broker uses it; otherwise the profile
   default applies. CLI/profile semantics are unchanged.

3. UI surfaces:
   - Reasoning chip dropdown gains a two-button scope picker:
     "Set for this session only" / "Set as profile default" — the
     session-only path writes /api/session/reasoning, the profile path
     keeps writing /api/reasoning (CLI parity).
   - "Clear session override" row appears when a session has an override.
   - Chip carries a `.session-override` class -> italic label + small
     leading dot, signalling the chip is showing a session-scoped value
     (Q1 from the maintainer's reply).
   - /reasoning slash command accepts an optional `session` qualifier:
     `/reasoning high session` writes the session override; the plain
     form (`/reasoning high`) still writes the profile default.

Other behaviour:
- /api/session/duplicate carries the override forward so the copy
  behaves identically (Q2 from the maintainer's reply).
- Override is persisted in the session sidecar JSON, not in-memory (Q3).
- /api/session/reasoning validates against api.config.VALID_REASONING_EFFORTS
  and accepts null/empty to clear the override (inherit profile default).

Tests: 22 new pass in tests/test_2697_per_session_reasoning_effort.py
covering Session round-trip, streaming precedence, route validation,
duplicate-session carry-forward, slash-command parse, dropdown markup,
CSS indicator, and the clear-override flow.

The existing test_reasoning_chip_btw_fixes.py "dropdowns grouped within
2 KB" check was bumped to 3 KB to accommodate the new scope picker /
clear-override rows inside the same dropdown container; the structural
invariant (sibling of composer-footer, not nested in composer-left) is
unchanged.
…asoningChip

PR nesquena#2963 / nesquena#2697 added a second `sessionOverride` argument to
`_applyReasoningChip` so `tests/test_mobile_layout.py::
test_reasoning_chip_updates_desktop_and_mobile_controls` could no longer
locate the function body via the literal substring
`"function _applyReasoningChip(eff)"`, producing
`ValueError: substring not found` before any of the six invariants
(`composerReasoningWrap`, `composerMobileReasoningAction`,
`composerReasoningLabel`, `composerMobileReasoningLabel`,
`label.textContent=text`, `mobileLabel.textContent=text`) could be
checked.

Switching the slice marker to `"function _applyReasoningChip("` keeps
the test signature-agnostic so future arg additions don't silently bury
the actual invariants the test guards. The six contracts themselves are
unchanged.
@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Pulled the worktree and read the diff against api/models.py, api/routes.py:5230-5272, api/streaming.py:4283-4303, static/ui.js:1629-1830, static/commands.js:1128-1206, and the 22-test file. The three-layer split (storage → resolver → surfaces) is the right shape and matches the maintainer's Q2/Q3 answers on #2697. A few concrete items worth resolving before this lands.

Session switch leaves the chip showing the previous session's override

syncReasoningChip() at static/ui.js:1717-1720 reuses the cached _currentReasoningEffort / _currentReasoningSessionOverride and never re-reads S.session.reasoning_effort:

function syncReasoningChip(){
  if(_currentReasoningEffort===null){fetchReasoningChip();return;}
  _applyReasoningChip(_currentReasoningEffort,_currentReasoningSessionOverride);
}

syncReasoningChip() is the function syncTopbar() calls on every session switch (static/ui.js:5201, and sessions.js:633 after S.session=data.session). So after the first fetchReasoningChip() populates the cache, switching from session A (reasoning_effort='high', override) to session B (no override, profile default 'medium') will repaint with 'high' + override-class — the chip is now lying about session B.

Compare to _syncToolsetsChip() at static/ui.js:1865-1871, which reads from S.session.enabled_toolsets every call. That's the pattern this chip needs:

function syncReasoningChip(){
  const sid=S&&S.session&&S.session.session_id;
  if(sid){
    const sessionOverride=_normalizeReasoningEffort(S.session.reasoning_effort||'');
    const effective=sessionOverride||_lastProfileReasoningEffort||'';
    _applyReasoningChip(effective, sessionOverride||null);
    return;
  }
  if(_currentReasoningEffort===null){fetchReasoningChip();return;}
  _applyReasoningChip(_currentReasoningEffort,null);
}

This needs a small companion change: cache the profile default separately (_lastProfileReasoningEffort) inside fetchReasoningChip()'s .then so the merge can happen offline without an extra /api/reasoning round-trip on every session switch. Worth a test in tests/test_2697_per_session_reasoning_effort.py::TestDropdownScopePicker that asserts switching sessions repaints from S.session.reasoning_effort rather than the cache.

Scope picker default + missing session is silently wrong

In static/ui.js:1812-1820, when _reasoningWriteScope==='session' but no sid, the click falls through to the profile-default branch:

if(_reasoningWriteScope==='session'&&sid){
  api('/api/session/reasoning', ...)
}else{
  api('/api/reasoning', ...)

A user who explicitly clicked "Set for this session only" with no session loaded will write their config.yaml default and only see a toast saying "set to X (profile default)". That's a UX trap — quietly switching scopes opposite to what the picker says. Either disable the effort rows when scope=session and no session is active, or surface a toast like "No active session — open or create one first" and return without writing. The slash command path already does the latter (commands.js:1191: if(!sid){showToast(t('no_active_session'));return true;}); the dropdown should match.

CHANGELOG missing

AGENTS.md calls for a CHANGELOG entry on user-visible behavior changes. This adds a dropdown row, a slash-command qualifier, a chip indicator, and a new endpoint — all release-note material. git diff --name-only origin/master...HEAD shows no CHANGELOG.md. Worth a one-liner under the appropriate section so the release notes carry it.

Smaller items

  • api/streaming.py:4288-4297 — the override branch uses truthiness on _session_effort, so the string 'none' correctly takes effect (non-empty → _parse_reff('none') returns {"enabled": False}). Good. Worth a comment-line note that 'none' is intentional (disables reasoning at the session layer), to head off a future "simplify this truthiness check" cleanup that strips it back.
  • The test at tests/test_2697_per_session_reasoning_effort.py::TestStreamingResolvesSessionOverFirst::test_session_override_branch_exists is a literal string-match against "_session_effort = getattr(s, 'reasoning_effort', None)". Trivial whitespace edits to that line will trip the assertion. Consider a behavioral test using a fake _cfg + a fake session, or relax to a regex on getattr.*reasoning_effort.*None.
  • api/routes.py:5230 — the /api/session/reasoning endpoint validates effort but doesn't publish_session_list_changed(). The enabled_toolsets peer at /api/session/toolsets may or may not (worth checking) — but if a session-list refresh is expected when toolsets change, the override write should match.

The PR is well-scoped and the test suite is solid; the chip-cache bug is the one item I'd consider a blocker, since it lands a regression on the existing single-session chip behavior the moment a user has two sessions. The rest are polish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants