feat(tlon): add core platform adapter#2
Conversation
Adds optional channel-context backfill for Discord shared-channel sessions so the agent can see recent messages it missed between its own turns (typically when require_mention=true filters out most traffic). Previously the agent only saw the @mention message that triggered it, which led to disorienting replies in active multi-user channels where the conversation context was invisible. With backfill enabled, a configurable number of recent messages are fetched per-turn and prepended to the trigger message as a context block, kept separate from sender-prefix logic so attribution remains clean. This re-opens the work from NousResearch#13063 (approved by @OutThisLife on 2026-04-20, closed when I closed the branch to address the simpolism:main head-branch issue plus an ordering bug I caught later in live use). Filing against the freshly-rewritten problem statement in NousResearch#13054 so the design is grounded in the failure mode rather than the implementation shape. The implementation follows the **push-mode last-self-anchored** design from the two options laid out in NousResearch#13054. See the issue for the trade-off discussion vs pull-mode (NousResearch#13120 was an earlier closed PR using that shape). Treating this as a reference implementation — happy to rewrite as last-trigger anchoring or as a hybrid with NousResearch#13120 if maintainers prefer. Changes: - gateway/platforms/discord.py: - new `_discord_history_backfill()` / `_discord_history_backfill_limit()` helpers (config.extra > env > default), mirroring the existing `_discord_require_mention()` shape - new `_fetch_channel_context()` that scans `channel.history()` backwards from the trigger to the bot's last message (or limit), formats as `[Recent channel messages] / [name] msg / ...`, respects DISCORD_ALLOW_BOTS, skips system messages - per-channel `_last_self_message_id` cache to narrow the fetch window on hot paths (avoids full history scan when the bot has spoken recently) - **IMPORTANT**: passes `oldest_first=False` explicitly to `channel.history()`. discord.py 2.x silently flips the default to True when `after=` is supplied, which would select the EARLIEST N messages after our last response instead of the LATEST N before the trigger. In high-traffic windows this would return stale tool traces and drop the actual final answer the user is asking about. See regression test below. Caught in live use during a Codex tool-trace burst on May 13 2026. - gateway/config.py: discord_history_backfill + discord_history_backfill_limit settings + yaml→env bridge - gateway/platforms/base.py: channel_context field on MessageEvent - gateway/run.py: prepend channel_context after sender-prefix so the [sender name] tag applies to the trigger message alone, not to the backfill - hermes_cli/config.py: defaults for new discord.history_backfill and discord.history_backfill_limit keys - cli-config.yaml.example: documented defaults - tests/gateway/test_discord_free_response.py: 7 new tests covering cold-start backfill, self-message stop boundary, other-bot filtering, cache hot-path narrowing, stale-cache fallback, shared-channel + per-user backfill paths, and the ordering regression test (`test_fetch_channel_context_cache_uses_latest_window_when_after_set`) - tests/gateway/test_config.py: yaml→env bridge tests - tests/gateway/test_session.py: prefix-order edge cases - website/docs/user-guide/messaging/discord.md: env vars + config keys + usage docs Tested on Ubuntu 24.04 — empirically validated in my own multi-bot Discord research server for the past three weeks. Fixes NousResearch#13054 Supersedes NousResearch#13063 (closed)
Follow-up to snav's PR NousResearch#25463 contribution: flip default to on, broaden scope so backfill fires whenever require_mention gates the bot (not just shared-session channels). Why: - The mention-gate creates a session-transcript gap regardless of whether the channel is shared or per-user. In per-user sessions, Alice's session is still missing other participants' messages and her own pre-mention messages — backfill fills both gaps. - Threads naturally scope to thread-only history because discord.py's channel.history() on a thread returns only that thread's messages. - DMs still skip — every DM triggers the bot, so the session transcript is already complete. Changes: - hermes_cli/config.py: discord.history_backfill default → true - gateway/platforms/discord.py: drop the _is_shared gate, keep _is_dm skip and _needed_mention gate; env var DISCORD_HISTORY_BACKFILL default → 'true' - cli-config.yaml.example + website docs: update defaults and prose; add the DISCORD_HISTORY_BACKFILL / _LIMIT env var rows that were documented in the PR description but missing from the env-var table - tests/gateway/test_discord_free_response.py: - flip test_discord_per_user_channel_does_not_backfill → test_discord_per_user_channel_backfills_too (new behavior) - add test_discord_dm_does_not_backfill (DM skip is invariant) - give FakeThread a no-op history() so existing thread tests don't hit a fake discord.Forbidden when backfill now fires on threads too Tests: 160/160 in target files; 400/400 across all tests/gateway/ -k discord.
The prebuild step used `rm -rf` and `cp -r`, which fail on Windows (`'rm' is not recognized`). Replace with an inline Node one-liner using fs.rmSync / fs.cpSync so the build works on Windows, macOS, and Linux without adding a dependency.
…Research#25978) Pre-existing diagnostics below an edit point used to surface as 'LSP diagnostics introduced by this edit' whenever the edit deleted or inserted lines. The delta-filter key included the diagnostic's range, so the same logical error reported at a different line in the post-edit snapshot looked like a brand new diagnostic. Concrete case: deleting 14 lines in cli.py caused Pyright errors at lines 9873, 10590, 12413, 13004 (unrelated to the edit) to be reported as introduced by it. Fix: build a piecewise-linear line-shift map (via difflib's SequenceMatcher) from pre and post content, and remap baseline diagnostics into post-edit coordinates before the set-difference. Diagnostics in deleted regions drop out cleanly; diagnostics below the edit shift by the right amount; diagnostics above are untouched. The strict (range-aware) equality key stays — so a genuinely new instance of an identical error class at a different line still surfaces as new. Pieces: - agent/lsp/range_shift.py — build_line_shift, shift_diagnostic_range, shift_baseline. Pure functions, no LSP state. - agent/lsp/manager.py — LSPService.get_diagnostics_sync gains an optional line_shift kwarg; baseline is shift_baseline'd before computing the seen-set. _diag_key keeps the strict range key. - tools/file_operations.py — write_file captures pre_content for any LSP-handled extension (not just LINTERS_INPROC) and passes pre/post to _maybe_lsp_diagnostics, which builds the shift map. - New _lsp_handles_extension helper guards the pre_content read. Trade-offs preserved: - Genuinely new same-class errors at different lines still surface (content-only key would have swallowed them). - Pre-existing errors at unshifted positions still get filtered (covered by the strict-key path with no shift). - Best-effort: when pre_content can't be captured (file didn't exist, permissions), the unshifted comparison still catches most pre-existing errors; the edge case it misses is a new file with a non-empty baseline, which is structurally impossible.
Three Windows-only bugs in the web-dashboard build path. Each is small, scoped, and verified end-to-end on Windows 11 — including under a stock cmd.exe / PowerShell console with its default cp1252 encoding. 1. `sync-assets` shells out to Unix-only commands web/package.json hard-codes `rm -rf … && cp -r …`. Neither exists on Windows cmd.exe. `hermes_cli/main.py::_build_web_ui` runs npm via subprocess (which on Windows defaults to cmd.exe), so the prebuild hook crashed before Vite ever ran and the dashboard never built. Fix: web/scripts/sync-assets.mjs — ~20 lines of Node using fs.rmSync + fs.cpSync (stdlib, Node >= 16.7). No new deps, identical behavior on POSIX and Windows. 2. Build failures were silent _build_web_ui ran both subprocess calls with capture_output=True and never relayed the captured buffers on failure. Users saw 'Web UI build failed' and nothing else — no stdout, no stderr, no hint that the real problem was 'rm is not recognized'. Fix: inner _relay() helper that decodes and prints stdout + stderr (utf-8, errors='replace') whenever a step returns non-zero. Replaces the existing stderr_tail-only relay on the build path; success path is unchanged. (stderr_tail is preserved for the stale-dist fallback branch added by NousResearch#23817.) Salvaged from NousResearch#13368 by @johnisag onto current main. Conflict resolution preserves main's improvements: - _run_npm_install_deterministic() (replaces bare subprocess.run for npm install) - npm-build retry-after-sleep for Windows boot-time races (NousResearch#23817) - stale-dist fallback for non-interactive callers (NousResearch#23817) Closes NousResearch#25073, NousResearch#13368.
Codex review pointed out that even with the sync-assets fix applied,
_build_web_ui still crashes on a stock Windows console before reaching
npm: Python stdout defaults to cp1252 (or similar) and raises
UnicodeEncodeError when print() hits the arrow/check glyphs used for
status messages (→, ✗, ⚠, ✓). Reproduced locally in PowerShell:
$ PYTHONIOENCODING=cp1252 python -c "from hermes_cli.main import _build_web_ui; _build_web_ui(Path('web'), fatal=True)"
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' ...
The previous PR body claimed "end-to-end verified on Windows 11", but
that was under the venv's default (utf-8) stdout. A plain `py` or
PowerShell invocation would still fail before sync-assets ever ran.
Fix: inner _say() helper that falls back to
text.encode(sys.stdout.encoding, errors="replace")
when print() raises UnicodeEncodeError. Glyphs degrade to '?' on
ASCII / cp1252 consoles; utf-8 consoles are unaffected. Verified the
full build pipeline runs to completion with PYTHONIOENCODING=cp1252.
Scoped tightly to _build_web_ui (the function this PR already touches);
other call sites in the codebase with the same risk are out of scope.
…gnal_handler The call site at line 246 is already wrapped in try/except NotImplementedError (added in NousResearch#25969). The checker just doesn't peek at surrounding context. Mark with the suppression comment so the blocking check passes.
The 'sessions' command has been registered in the central command registry since NousResearch#20805 (May 2025) and surfaces in /help and tab-completion, but the classic CLI's process_command() never had an elif branch for it. The canonical name fell through and printed 'Unknown command: sessions'. The TUI side was wired up correctly via the SessionPicker overlay; only the legacy CLI was missing the dispatch. Adds _handle_sessions_command() which mirrors /resume's no-arg behavior inline (the CLI has no overlay primitive equivalent to the TUI picker): - /sessions and /sessions list → print the recent-sessions table - /sessions <id_or_title> → delegates to _handle_resume_command Includes regression tests covering the dispatcher wiring (the original bug) plus the three handler branches.
…-ci-unblocker-after-21012 fix(ci): stabilize shared test state after 21012
AGENT_BROWSER_CHROME_FLAGS is not read by agent-browser CLI. The correct env var is AGENT_BROWSER_ARGS, with comma-separated values. This fixes Chrome 'No usable sandbox' crash on Ubuntu 23.10+ systems where AppArmor restricts unprivileged user namespaces. The detection logic was correct but the fix used the wrong environment variable name and space-separated instead of comma-separated args.
Follow-up to the sandbox-bypass env-var fix: - Update the opt-out gate so a user-provided AGENT_BROWSER_ARGS is also respected, not just the legacy AGENT_BROWSER_CHROME_FLAGS. Previously the gate only checked the broken legacy var, so a user who pre-set AGENT_BROWSER_ARGS would still get clobbered by Hermes's auto-injection. - Document AGENT_BROWSER_ARGS in .env.example, the browser feature page, and the env var reference, with notes about the auto-injection on AppArmor-restricted systems (Ubuntu 23.10+, DGX Spark, containers). - Add Anadi Jaggia to AUTHOR_MAP.
On macOS with uv-managed cPython 3.11, the default kqueue selector cannot
register fd 0, so prompt_toolkit's loop.add_reader raises
OSError(EINVAL) ("[Errno 22] Invalid argument") from kqueue.control()
and the agent crashes immediately on startup (NousResearch#5884, also reported in
NousResearch#6393).
Probe KqueueSelector.register(0, EVENT_READ) before launching
prompt_toolkit. If it fails, install an event-loop policy that returns a
SelectorEventLoop backed by SelectSelector — select() works fine on
stdin in this Python build, so add_reader succeeds and the agent
launches normally.
Also extend the existing NousResearch#6393 fallback handler to recognize EINVAL /
EBADF / "Invalid argument" so that any future selector failure on stdin
shows the friendly "reinstall Python via pyenv or Homebrew" guidance
instead of an opaque traceback.
Verified on macOS (Darwin 24.6.0) with uv-managed cPython 3.11.15: the
kqueue probe fails, the policy switch fires, and `hermes` launches
cleanly. No effect on platforms where kqueue can register fd 0.
When the auxiliary client falls through Nous (e.g. no stored auth, or runtime credential mint failed), users currently see only `debug`-level lines, so the next provider in the fallback chain takes over silently. Promote the no-auth path to a warning that tells operators to run `hermes auth`, and add a debug breadcrumb on the rarer mint-failed-but-stored-auth-still-present fallback path so the existing behavior (use the raw stored token) is preserved while staying investigable. Salvaged from NousResearch#23881 by @0xharryriddle. The contributor's original patch also short-circuited the second branch with a return, which broke the pool-entry fallback path covered by `test_try_nous_uses_pool_entry` — kept the warning intent, dropped the return so the fallback still works. Dropped the contributor's changes to `hermes_cli/goals.py` because the goal-pause path is unreachable when the auxiliary client is None (`judge_goal` returns `parse_failed=False`, which resets `consecutive_parse_failures`), so the reason string they added never surfaces in the pause message. Refs NousResearch#23876
The ACP Registry manifest (acp_registry/agent.json), the npm launcher package.json, and the launcher's HERMES_AGENT_VERSION constant must all match pyproject.toml exactly — tests/acp/test_registry_manifest.py enforces this lockstep. Without a release-script hook, the next weekly version bump fails that test until someone hand-edits four files. Extend update_version_files() to drive the ACP bump alongside __init__.py and pyproject.toml, and add tests covering the lockstep and the missing-files no-op path. Also map adam.manning@gmail.com -> am423 for the salvage commit.
…ousResearch#26106) * chore: remove Atropos RL environments, tools, tests, skill, and tinker-atropos submodule Delete: - environments/ (43 files — base env, agent loop, tool call parsers, benchmarks) - rl_cli.py (standalone RL training CLI) - tools/rl_training_tool.py (all 10 rl_* tools) - tests: test_rl_training_tool, test_tool_call_parsers, test_managed_server_tool_support, test_agent_loop, test_agent_loop_vllm, test_agent_loop_tool_calling, test_terminalbench2_env_security - optional-skills/mlops/hermes-atropos-environments/ - tinker-atropos git submodule + .gitmodules * chore: remove RL/Atropos references from Python source - toolsets.py: remove rl toolset block + update comment - model_tools.py: remove rl_tools group + update async bridging comment - hermes_cli/tools_config.py: remove RL display entry, _DEFAULT_OFF_TOOLSETS, setup block, and rl_training post-setup handler - tools/budget_config.py: remove RL environment reference in docstring - tests/test_model_tools.py: remove rl_tools from expected groups - tests/run_agent/test_streaming_tool_call_repair.py: fix stale cross-reference * chore: remove rl/yc-bench extras and tinker-atropos refs from pyproject.toml - Remove rl extra (atroposlib, tinker, fastapi, uvicorn, wandb) - Remove yc-bench extra - Remove rl_cli from py-modules - Remove [tool.ty.src] exclude for tinker-atropos - Remove [tool.ruff] exclude for tinker-atropos - Regenerate uv.lock * chore: remove tinker-atropos from install/setup scripts - setup-hermes.sh: remove entire tinker-atropos submodule install block - scripts/install.sh: remove both tinker-atropos blocks (Termux + standard) - scripts/install.ps1: remove tinker-atropos block - nix/hermes-agent.nix: remove tinker-atropos pip install line * chore: remove RL references from cli-config.yaml.example * docs: remove Atropos/RL references from README, CONTRIBUTING, AGENTS.md * docs: remove RL/Atropos references from website - Delete: environments.md, rl-training.md, mlops-hermes-atropos-environments.md - sidebars.ts: remove rl-training and environments sidebar entries - optional-skills-catalog.md: remove hermes-atropos-environments row - tools-reference.md: remove entire rl toolset section - toolsets-reference.md: remove rl row + update example - integrations/index.md: remove RL Training bullet - architecture.md: remove environments/ from tree + RL section - contributing.md: remove tinker-atropos setup - updating.md: remove tinker-atropos install + stale submodule update * chore: remove remaining RL/Atropos stragglers - hermes_cli/config.py: remove TINKER_API_KEY + WANDB_API_KEY env var defs - hermes_cli/doctor.py: remove Submodules check section (tinker-atropos) - hermes_cli/setup.py: remove RL Training status check - hermes_cli/status.py: remove Tinker + WandB from API key status display - agent/display.py: remove both rl_* tool preview/activity blocks - website/docs: remove RL references from providers.md + env-variables.md - tests: remove TINKER_API_KEY from conftest, set_config_value, setup_script * chore: remove RL training section from .env.example
The ACP Registry schema supports uvx as a first-class distribution method alongside npx and binary. Pointing the registry directly at the existing hermes-agent PyPI release removes: - the @NousResearch npm scope (we don't own it) - a separate npm publish step on every weekly release - 90 lines of Node launcher + tests in packages/hermes-agent-acp/ The Zed registry now installs Hermes via: uvx --from 'hermes-agent[acp]==<version>' hermes-acp This is the same command the npm launcher was shelling out to anyway, so end-user behavior is unchanged. Registry CI validates the PyPI URL + version-pin exact match automatically. Changes: - acp_registry/agent.json: distribution.npx -> distribution.uvx - delete packages/hermes-agent-acp/ entirely - scripts/release.py: drop npm-launcher bump paths, keep manifest lockstep - tests/acp/test_registry_manifest.py: assert uvx shape + version pin - tests/scripts/test_release_acp_registry.py: rewrite for uvx-only shape - docs (user-guide + dev-guide): drop all npm-launcher references - delete docs/plans/acp-registry-zed-integration.md (stale, npm-shaped) Validated against agentclientprotocol/registry agent.schema.json via jsonschema. hermes-agent==0.13.0 is already live on PyPI.
…chments
Discord's CDN serves attachments with Content-Encoding: br. aiohttp's
compression_utils tries 'import brotlicffi as brotli' first and falls back
to google's Brotli, but Brotli<1.2.0's Decompressor.process() is 1-arg
while aiohttp calls it with 2 args (data, max_length). Result: every
.txt/.md/.doc uploaded to a Discord-gateway session fails to decode at
att.read() with 'Can not decode content-encoding: br' / 'TypeError:
process() takes exactly 1 argument (2 given)', the agent never sees the
bytes, and falls back to filesystem guessing.
Pin brotlicffi==1.2.0.1 in both surfaces:
- tools/lazy_deps.py 'platform.discord' tuple: Discord users on the
lazy-install path get it on first discord.py import.
- pyproject.toml [messaging] extra: users who explicitly install
hermes-agent[messaging] (skipping the lazy path) get it eagerly.
brotlicffi wins aiohttp's import race regardless of what else is
installed (try brotlicffi / except: import brotli), so existing setups
that already pulled google's Brotli transitively don't change behavior
beyond the bug fix. ~1.5 MB wheel, manylinux/macOS/Windows coverage.
E2E verified: round-trip decode of Brotli-compressed payload via
aiohttp.compression_utils.brotli succeeds with brotlicffi pinned; same
test against Brotli==1.1.0 alone reproduces the reported TypeError.
Credit to @Korkyzer for the original diagnosis and fix shape in NousResearch#15744;
the lazy-deps gating layer was added on top to keep brotlicffi out of
the install path for users who don't run a Discord gateway.
Fixes NousResearch#12511.
Closes NousResearch#15744.
Co-authored-by: Korky <korkyzer@gmail.com>
Two long-standing prompt_toolkit bugs in the base hermes CLI: 1. Resize duplication. Column-shrink resize used to push 40+ rows of duplicate chrome (status bar, input rules) into terminal scrollback every resize. Same wall as pt issues NousResearch#29 (open since 2014), NousResearch#1675, NousResearch#1933 — aider/xonsh/ipython all use alt-screen to dodge it. Root cause (verified by reading prompt_toolkit/renderer.py): _output_screen_diff (renderer.py L232-242) deliberately moves the cursor to the bottom of the canvas after every paint 'to make sure the terminal scrolls up'. In non-fullscreen mode this scrolls chrome content into terminal scrollback on every render — not just on resize. Fix: monkey-patch prompt_toolkit.renderer._output_screen_diff to bypass the reserve-vertical-space cursor move. When pt's logic checks 'if current_height > previous_screen.height', we inflate the previous screen height so the branch falls through. ~30-line wrapper, no fork of pt, no alt-screen, no DECSTBM scroll region. Verified empirically in real Terminal.app: 10 resizes (mixed shrinks/widens 1300→500→1400) during streaming produced ZERO scrollback delta, full agent response preserved, status bar pinned at bottom, no visible duplicates. pt is pinned to ==3.0.52 so the private-function patch is safe; future pt bumps will need to re-verify the signature matches. 2. Light-mode terminal visibility. Hardcoded skin colors (#FFF8DC cornsilk, #FFD700 gold, #B8860B dark goldenrod) are tuned for dark Terminal.app — invisible on light/cream backgrounds. Port ui-tui/src/theme.ts detectLightMode() to Python so the base CLI adapts. Detection priority: HERMES_LIGHT/HERMES_TUI_LIGHT env → HERMES_TUI_THEME=light|dark → HERMES_TUI_BACKGROUND=#RRGGBB → COLORFGBG env (xterm/Konsole/urxvt) → OSC 11 query (\x1b]11;?\x1b\\) with 100ms timeout → default dark. OSC 11 is tty-gated so gateway/cron/batch/subagent code paths don't pay the timeout cost. When light mode is detected, dark-mode colors auto-remap to readable equivalents (#FFF8DC → #1A1A1A, #FFD700 → #9A6B00, etc). Hooked at three points: - _hex_to_ansi() — auto-remaps any color emitted via the ANSI helper - _build_tui_style_dict() — rewrites pt style strings (chrome bg/fg) - SkinConfig.get_color() — wrapped at module load so Rich Panel borders/body text get the remap too Status-bar foreground colors (#C0C0C0, #888888, etc.) are explicitly skipped because they're paired with a dark navy bg — remapping them would make them invisible in dark mode. 3. Other visibility fixes: [thinking] reasoning preview now uses ANSI dim+italic (\x1b[2;3m) instead of #B8860B so it inherits terminal default fg color. Input/prompt area defaults to terminal default fg (was #FFF8DC cornsilk → invisible on cream). Co-authored-by: Brooklyn Nicholson <brooklyn.bb.nicholson@gmail.com>
Adds 16 unit tests covering the light/dark terminal detection path introduced in the previous commit: - Env override priority (HERMES_LIGHT, HERMES_TUI_LIGHT, HERMES_TUI_THEME, HERMES_TUI_BACKGROUND, COLORFGBG) - Detection cache stickiness - _maybe_remap_for_light_mode() no-op in dark mode - Known dark-mode color remap (#FFF8DC -> #1A1A1A etc) - Case-insensitive lookup - Unknown color passthrough - Status-bar paired colors (#C0C0C0, #888888, #555555, #8B8682) are intentionally NOT remapped — regression guard for the patch-11 fix, since remapping them would produce dark-on-dark on the status bar's navy bg - SkinConfig.get_color() wrapper is installed and idempotent - SkinConfig.get_color() does remap in light mode and passes through in dark mode We don't try to fake an OSC 11 reply — that path is exercised end-to-end in real Terminal.app; the env-override path covers the algorithmic logic.
…store full-width borders (NousResearch#26163) NousResearch#25975 (salvaging NousResearch#24403) clamped decorative scrollback Panels and streaming box rules to `max(32, min(width, 56))` as a defense against terminal-emulator reflow when columns shrink. On any modern wide terminal this made the response/reasoning borders look stubby — 56 cols inside a 200-col viewport. NousResearch#26137 (salvaging NousResearch#25981, by @OutThisLife) landed a more fundamental fix: prompt_toolkit's `_output_screen_diff` is monkey-patched so its reserve-vertical-space cursor move no longer pushes chrome into scrollback at all. With that in place, the clamp is no longer load-bearing for the chrome-into-scrollback class of bugs — the remaining risk is purely cosmetic reflow of *already stamped* Panel borders during an aggressive column shrink, which we now accept as a tradeoff for restoring proper full-width rendering. Changes: - `_scrollback_box_width()` returns `max(32, width)` (just the floor, no upper cap). All 10 call sites stay valid. - Updated `test_scrollback_box_width_caps_to_resize_safe_value` to the new `test_scrollback_box_width_returns_viewport_width` asserting full-width passthrough above the 32-col floor. Floor of 32 is kept so `'─' * (w - 2)` math stays positive on tiny terminals. Refs NousResearch#18449 NousResearch#19280 NousResearch#22976 (the original reflow class) and NousResearch#25975 (the clamp this reverts).
The freeform /goal judge was capped at max_tokens=200, which reliably
truncated the JSON verdict on reasoning-heavy models (deepseek-v4-pro,
qwq, etc.) — the model burns tokens on hidden reasoning before emitting
visible content, and the first /goal turn's prompt is larger than later
turns, blowing past 200. Symptom: agent.log shows
`judge reply was not JSON: '{"done": true, "reason": "The agent successfully'`
followed by repeated `judge returned empty response` lines, then the
goal pauses with a misleading 'judge model isn't returning the required
JSON verdict' message.
Diagnosed live by @helix4u — empirically verified that raising the
budget on an unmodified worktree makes the failures go away on the
exact configs users were hitting on Nous Plus subscription paths.
Changes:
- DEFAULT_JUDGE_MAX_TOKENS = 4096 (up from 200)
- New auxiliary.goal_judge.max_tokens config knob for tuning in
specifically constrained setups
- _goal_judge_max_tokens() resolves the value with fail-open semantics
(non-int / non-positive / load failure → default). load_config() is
mtime-cached so per-turn lookup is cheap.
Scoped narrowly to the verified root cause — does not introduce a
submit_verdict tool-call schema (see NousResearch#26162 / NousResearch#23671 for that direction;
they can land separately if we want them).
Tests: tests/hermes_cli/test_goals.py + tests/cli/test_cli_goal_interrupt.py
+ tests/gateway/test_goal_verdict_send.py — 62/62 passing.
E2E verified: config override honored (8192), missing/garbage/zero
values fall back to 4096, no-auxiliary-section falls back to 4096.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Credits:
- @helix4u (Gille) — diagnosed the max_tokens=200 truncation via live
testing on an unmodified worktree, drafted the original fix shape
in NousResearch#26162.
- @AhmetArif0 — flagged the freeform judge fragility in NousResearch#23671 from
the tool-call angle.
- @0xharryriddle (HarryRiddle.eth) — reported the issue from a Nous
Plus subscription setup in NousResearch#23876 with full debug reports.
Closes NousResearch#23876
Supersedes NousResearch#26162, NousResearch#23671, NousResearch#23881
…sResearch#26148) * ci(pypi): add publish workflow for automated PyPI releases Triggered by CalVer tag pushes from scripts/release.py (v20* pattern). Three jobs: build (uv build) → publish (OIDC trusted publishing) → sign (Sigstore + attach to existing GitHub Release). - workflow_dispatch as manual escape hatch - skip-existing for safe re-runs - Graceful skip when GitHub Release not found (sign job) - Top-level permissions: contents: read (CodeQL compliant) Requires one-time setup: PyPI trusted publisher + GitHub pypi environment. Co-authored-by: dmahan93 <44207705+dmahan93@users.noreply.github.com> * fix(release): address review findings - Stage acp_registry/agent.json in version bump commit (was silently left unstaged) - Add missing return when no previous tags found without --first-release - Fix get_pr_number return type annotation (str -> str | None) - Prefer uv build over python -m build (matches CI workflow), with fallback - Use unit separator (%x1f) in git log format to handle | in author names - Add explicit encoding='utf-8' to .release_notes.md write Workflow hardening: - Gracefully skip signing when GitHub Release not found (env var gate instead of exit 1, so PyPI publish still shows green) * fix(ci): harden PyPI workflow — SHA-pin actions, guard workflow_dispatch, explicit build flags - Pin all actions to commit SHAs (supply-chain hardening for id-token:write) - workflow_dispatch now requires confirm_tag input + checks out that tag - Both uv build paths explicitly pass --sdist --wheel --------- Co-authored-by: dmahan93 <44207705+dmahan93@users.noreply.github.com>
Cron mutation operations (run/pause/resume/remove) and 'hermes cron edit'
now accept a job name in addition to the hex ID, with case-insensitive
matching. Before this, 'hermes cron run my_job_name' died with
'Job with ID my_job_name not found' and forced the user to look up the
hex ID first.
The original PR matched by name but silently picked the first match when
two jobs shared a name. This version refuses to act on an ambiguous name
and surfaces every matching job (id, name, schedule, next_run_at) so the
caller can pick a specific ID.
- cron/jobs.py:
- get_job() stays ID-only (preserves existing call-site semantics for
web_server/api_server/curator/scheduler/test code that always passes
real IDs).
- resolve_job_ref() is the new name-or-ID resolver, used by pause/
resume/trigger/remove_job. Exact ID match wins over a name match
even if a different job's name happens to equal that ID. Ambiguous
name match raises AmbiguousJobReference with all candidate IDs.
- tools/cronjob_tools.py: dispatch site uses resolve_job_ref, surfaces
ambiguous matches as a structured error with the matching IDs.
- hermes_cli/cron.py: 'cron edit' uses resolve_job_ref so editing by
name works and ambiguous names are reported with IDs.
- tests/cron/test_jobs.py: new TestResolveJobRef covering ID match,
case-insensitive name match, ID-wins-over-name, ambiguous refusal,
and that pause/resume/trigger/remove all refuse on ambiguity.
Closes NousResearch#2627
…gistry installs
The Zed ACP Registry path (uvx --from 'hermes-agent[acp]==X' hermes-acp)
gets a Python-only install. Browser tools depend on the agent-browser npm
package + Chromium, neither of which are in the wheel. Without an
explicit bootstrap, registry users have no path to working browser tools.
Ship a bundled, idempotent bootstrap script (Linux/macOS bash + Windows
PowerShell) inside acp_adapter/bootstrap/ as wheel package-data. New
entry points:
hermes acp --setup-browser # interactive; prompts before Chromium download
hermes acp --setup-browser --yes # non-interactive
hermes-acp --setup-browser
The terminal-auth flow (hermes acp --setup) also offers the browser
bootstrap as a follow-up after model selection, so first-run registry
users get the option without knowing the flag exists.
Key design choices:
- npm install -g --prefix $NODE_PREFIX so we never need sudo. System Node
on PATH is respected; only the install target is redirected to the
user-writable Hermes-managed Node prefix.
- tools/browser_tool.py::_browser_candidate_path_dirs() already walks
$HERMES_HOME/node/bin, so installed binaries are discovered with no
agent-side code change.
- System Chrome/Chromium detection short-circuits the ~400 MB Playwright
download when a suitable browser already exists.
- Bash + PowerShell live as ONE copy each under acp_adapter/bootstrap/.
Not duplicated under scripts/. install.sh and install.ps1 keep their
inline browser blocks for the source-checkout path.
E2E validated end-to-end:
bash bootstrap_browser_tools.sh --skip-chromium
→ installs agent-browser into ~/.hermes/node/bin/
tools.browser_tool._find_agent_browser()
→ returns the installed path
check_browser_requirements()
→ returns True (browser tools register)
Tests:
- tests/acp/test_entry.py: 11 tests covering --setup-browser dispatch
(linux + windows + --yes forwarding + failure propagation), the
terminal-auth follow-up prompt path, and a package-data wheel-shipping
assertion that catches any future pyproject.toml regression.
Docs: website/docs/user-guide/features/acp.md gains a 'Browser tools
(optional)' subsection with the two-line install + what-it-does.
SimpleX Chat (https://simplex.chat) is a private, decentralised messenger with no persistent user IDs — every contact is identified by an opaque internal ID generated at connection time. This adds it as a Hermes gateway platform via the plugin system. The adapter connects to a local simplex-chat daemon via WebSocket, listens for inbound messages, and sends replies. Originally proposed in PR NousResearch#2558 as a core-modifying integration; reshaped here as a self- contained plugin under plugins/platforms/simplex/ with no edits to any core file. Discovery is filesystem-based (scanned by gateway.config), and the platform identity is resolved on demand via Platform("simplex"). Plugin contract: - check_requirements() requires SIMPLEX_WS_URL AND the websockets package - validate_config() / is_connected() accept env or config.yaml input - _env_enablement() seeds PlatformConfig.extra (ws_url + home_channel) - _standalone_send() supports out-of-process cron delivery - interactive_setup() provides a stdin wizard for hermes gateway setup - register() wires the adapter into the registry with required_env, install_hint, cron_deliver_env_var, allowed_users_env, and a platform_hint for the LLM. Lazy dependency: the websockets Python package is imported inside the functions that need it. The plugin is importable and discoverable even when websockets is missing — check_requirements() simply returns False until `pip install websockets` is run. No new pyproject extras are introduced. Environment variables: SIMPLEX_WS_URL WebSocket URL of the daemon (required) SIMPLEX_ALLOWED_USERS Comma-separated allowed contact IDs SIMPLEX_ALLOW_ALL_USERS Set true to allow all contacts SIMPLEX_HOME_CHANNEL Default contact for cron delivery SIMPLEX_HOME_CHANNEL_NAME Human label for the home channel Closes NousResearch#2557.
- Adds plugins/platforms/simplex docs page to the messaging sidebar between LINE and Open WebUI. - Maps louismichalot@hotmail.com -> Mibayy in scripts/release.py so the attribution check on the salvage PR passes.
When running with --yolo, all dangerous command approvals are bypassed. Make this state visible so users don't forget: - Banner: '⚠ YOLO mode — all approval prompts bypassed' line in red, only shown when YOLO is active. Default case is silent (no extra line, no always-on 'restricted' label). - Status bar: '⚠ YOLO' fragment appended in red (#FF4444 bold) across all three width tiers (<52, <76, ≥76) in both the plain-text fallback and the fragments builder. Closes NousResearch#2663 Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>
Replace O(n²) string concatenation of truncated_response_prefix in the length-continuation retry loop with a list + ''.join(). Functionally equivalent: same partial response on early return, same prepend on final assembly. The legacy retry path is capped at 3 iterations, so the practical wall-clock win is small, but the new idiom matches the rest of the codebase and removes a needless repeated allocation. Salvaged from PR NousResearch#2717 (the run_conversation portion only — trajectory refactor dropped because it silently rewrote </tool_response> to </think>). Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Some catalog endpoints (OpenCode Zen, etc.) sit behind a WAF that returns 403 for the default Python-urllib/<ver> User-Agent. The generic profile-based live fetch in providers/base.py was silently failing for any such provider — falling through to the static catalog and missing newly-launched models. Set a generic 'hermes-cli/<version>' UA on the catalog probe so every api_key provider profile benefits. Verified live against opencode-zen: before this change, profile.fetch_models() raised HTTP 403; after, it returns 42 models including gpt-5.5, gpt-5.5-pro, kimi-k2.6, glm-5.1 and the *-free variants the static catalog doesn't list. Also strip the now-stale comment in validate_requested_model() claiming opencode-zen's /models returns 404 against the HTML marketing site — the API endpoint at /zen/v1/models returns 200 with valid JSON. Surfaced by NousResearch#2651 (@aashizpoudel) — fixes the same user-facing gap their PR targeted, applied at the right layer so all api_key provider profiles get live catalogs through the same code path. Co-authored-by: Aashish Poudel <mr.aashiz@gmail.com>
Remove redundant inner `import re` and regex recompilation on every call in _interpolate_env_vars. Add module-level _ENV_VAR_PATTERN compiled once. Replace the separate _interpolate_value() in mcp_config.py (which used \w+ and would silently fail on env vars containing hyphens or dots) with the shared _ENV_VAR_PATTERN from mcp_tool.py. Remove now-unused import re.
Replaces bare `except Exception: pass` with debug-level logging so failures in local endpoint model discovery are diagnosable instead of silently hidden.
Three asyncio.gather() calls in tools/web_tools.py ran without return_exceptions=True. A single failing task (e.g. LLM rate limit on one URL) would raise out of gather() and discard every other successfully fetched/summarized result. Pass return_exceptions=True and filter BaseException entries with a warning log before unpacking. Affects: - chunk summarization gather (large web_extract pages) - firecrawl per-result LLM post-processing - tavily crawl per-result LLM post-processing Closes NousResearch#2744
PR NousResearch#2751 salvage. CI requires AUTHOR_MAP coverage for all contributor commit emails.
When a user sends a Slack message like '/hermes ' (trailing whitespace after the slash) the legacy subcommand router hit `text.split()[0]` with a truthy-but-whitespace-only `text`. `' '.split()` returns `[]` → IndexError, blowing up the slash handler before fallthrough to `/help`. Switch to a two-step guard that materializes the parts list first and indexes only if non-empty. Salvaged from PR NousResearch#2752 by @nidhi-singh02. The PR's other two hunks (`tools/file_operations.py`, `agent/anthropic_adapter.py`) are unreachable in current code — `LINTERS` is a hardcoded constant dict with no empty values, and the anthropic version-detection site is already guarded by a `result.stdout.strip()` truthy check — so only the slack hunk is taken. Closes NousResearch#2745 Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Wrap requests.post() in create_session() for browser_use, browserbase, and firecrawl providers with requests.RequestException handling. Connection timeouts and DNS resolution failures now surface as clean RuntimeError messages instead of raw requests exception tracebacks. Browser Use managed-gateway mode preserves raw exception propagation so the existing idempotency-key retry semantics keep working. Closes NousResearch#2746 Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
…_HOME into config.toml Builds on @steezkelly's Bug A fix (NousResearch#25857, top-level default_permissions via _insert_managed_block_at_top_level) by addressing the other two config-corruption bugs described in NousResearch#26250: Bug B (duplicate [plugins.X] tables) - Codex itself writes [plugins."<name>@<marketplace>"] tables to config.toml when the user runs `codex plugins enable` directly, before hermes-agent's managed block exists. On the next migrate run, _query_codex_plugins() re-discovers the same plugins via plugin/list and render_codex_toml_section() re-emits them inside the managed block. Codex's strict TOML parser then rejects the duplicate table header on startup. - Add _strip_unmanaged_plugin_tables() that drops [plugins.*] tables from the user-content portion of the file. Only run it when plugin/list succeeded — if the RPC failed we can't re-emit and must preserve the user's tables. plugin/list is the source of truth when it answers. Bug C (HERMES_HOME pytest-tempdir leak into ~/.codex/config.toml) - _build_hermes_tools_mcp_entry() read HERMES_HOME directly from os.environ, so a sibling pytest's monkeypatch.setenv("HERMES_HOME", tmp_path) silently burned a transient pytest tempdir into the user's real ~/.codex/config.toml. After pytest reaped the tempdir, every codex-routed hermes-tools tool call failed silently. - Derive HERMES_HOME from get_hermes_home() (the canonical resolver that goes through the profile-aware path) and refuse to emit obvious test-tempdir paths via _looks_like_test_tempdir() as belt-and-suspenders for any other callsite that forgets to patch migrate(). - test_enable_succeeds_when_codex_present in test_codex_runtime_switch.py invoked the real migrate() (no mock), writing to Path.home() / .codex using whatever HERMES_HOME the running pytest session had set. Add the same migrate patch the other apply() tests already use, so the suite stops touching the user's real ~/.codex/config.toml. E2E verification (replicating the issue's repro): - Pre-state config.toml with user [mcp_servers.omx_team_run] + codex-installed [plugins."tasks@openai-curated"], HERMES_HOME="/private/var/folders/.../pytest-of-.../..." - On origin/main: tomllib refuses to load the result with "Cannot declare ('plugins', 'tasks@openai-curated') twice" AND the pytest-tempdir HERMES_HOME is burned in. - On this branch: file parses cleanly, default_permissions is top-level, exactly one [plugins."tasks@openai-curated"] table inside the managed block, no HERMES_HOME in the MCP env. 7 new regression tests covering all three bugs + the test-leak guard. `bash scripts/run_tests.sh tests/hermes_cli/test_codex_runtime_*.py` — 95 passed, 0 failed. Closes NousResearch#26250
…2345 salvage (NousResearch#26319) PR NousResearch#22345 by @btorresgil authors commits as 'Brian Conklin <brian@dralth.com>' (git config carries a different name/email than the GitHub account). GitHub's commit-author mapping correctly attributes these commits to @btorresgil based on the public-key registration, but Hermes' release attribution audit reads the raw commit email, not the GitHub mapping. Without this AUTHOR_MAP entry, salvaging NousResearch#22345 would fail `scripts/contributor_audit.py` strict mode at release time. Prerequisite for the langfuse trace fix salvage that cherry-picks @btorresgil's commits onto current main.
…placeholder credentials (closes NousResearch#22342, NousResearch#22763) (NousResearch#26320) * fix(langfuse): reject placeholder credentials with one-shot warning When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template value like 'placeholder', 'test-key', or 'your-langfuse-key', the Langfuse SDK silently accepts the credentials at construction time and drops every trace at flush time. No warning, no error — just an empty Langfuse dashboard the operator only notices hours later. Add prefix-based validation in _get_langfuse() against the documented 'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side. Anything else fires a single warning naming the offending env var(s) with a log-safe value preview (full string for short placeholders so the operator knows which template they left in place; truncated for long values so a real secret pasted into the wrong field never hits the log), then short-circuits via the existing _INIT_FAILED cache so the warning fires once per process, not once per hook invocation. The check sits after the 'Langfuse is None' SDK-installed guard so hosts without the optional langfuse SDK don't see misleading 'set real keys' hints when the actionable fix is 'pip install langfuse'. Missing credentials remains the documented opt-out path and stays silent — no log noise for unconfigured installs. Fixes NousResearch#22763 Fixes NousResearch#23823 * fix(langfuse): use actual API request messages for generation input on_pre_llm_request previously used the messages kwarg alone, which could be None when Hermes passes the payload via request_messages, conversation_history, or user_message instead. Add _coerce_request_messages to pick the first available list across all variants, falling back to a synthetic user message. Generations now show the real outbound payload rather than an empty input. * fix(langfuse): record tool call outputs in traces Tool observations showed input (arguments) but output was always undefined. Root cause: when tool_call_id is empty, pre_tool_call stored observations under a unique time-based key that post_tool_call could never reconstruct, so every tool span was closed without output by the _finish_trace sweep. Fix pre/post matching by routing empty-tool_call_id tools through a per-name FIFO queue (pending_tools_by_name) instead of the time-based key. Tools with a tool_call_id continue to use the id-keyed dict. Also: - Preserve OpenAI-style nested function shape in serialized tool calls so Langfuse renders name/arguments correctly - Keep name + tool_call_id on role:tool messages for proper pairing - Backfill tool results onto the matching turn_tool_calls entry so the generation's tool-call record carries the result alongside arguments - Coerce request messages from whichever field the runtime provides (request_messages, messages, conversation_history, user_message) * fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test Self-review of the combined NousResearch#22345 + NousResearch#23831 salvage surfaced three issues worth fixing in the same PR rather than as follow-ups: 1. Drop is_first_turn from the pre_api_request hook. The boolean expression `not bool(conversation_history)` was wrong: conversation_history is reassigned to None mid-run after compression (5 sites in run_agent.py), so the value flips False -> True mid-conversation on every post-compression API call. The langfuse plugin never consumed it, so the kwarg was both misleading AND dead. 2. Replace copy.deepcopy(request_messages) with shallow list() copy. The pre_api_request hook contract discards return values (invoke_hook never writes back to api_kwargs), and the langfuse plugin's _serialize_messages already builds its own snapshot dicts via _safe_value. A deepcopy on every API call would walk every tool result and base64 image — significant overhead for no real isolation benefit. Shallow copy of the outer list protects against later mutations of api_messages without paying for the inner-dict walk. 3. Rename test_empty_tool_call_id_concurrent_fifo_order -> test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8 threads behind a barrier to actually exercise _STATE_LOCK on the pending_tools_by_name queue. The original test was sequential and only validated Python list semantics; this one validates the lock discipline. 4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED — that function does not exist. Tests reload the module via sys.modules.pop + importlib.import_module instead. Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>
…sResearch#26071) (NousResearch#26327) * feat(process-registry): add format_process_notification shared helper * feat(process-registry): add drain_notifications method * refactor(cli): use shared drain_notifications and format_process_notification * feat(tui): add background notification poller for completion_queue * feat(tui): wire notification poller into session init/finalize * refactor(tui): add post-turn drain using shared helper as safety net
…put renders correctly (NousResearch#26011) * fix(tui): restrict fast-echo bypass to ASCII so Vietnamese/CJK/IME input renders correctly The composer's fast-echo path (canFastAppend / canFastBackspace) writes characters straight to stdout to skip an Ink re-render on the hot typing path. The previous guard only checked 'stringWidth(text) === text.length', which lets a lot of non-ASCII through: - Vietnamese precomposed letters (ề, ắ, ờ, ự, ...) report width 1 and length 1, but a Vietnamese Telex / IME stack produces them across multiple keystrokes; the intermediate composition state must be drawn by Ink so the rendered cell, the stored value, and the cursor column stay in lockstep when the final commit replaces the preview. - NFD combining marks (U+0300..U+036F) are zero-width but length 1, so even a passing equality lets them slip and silently desync the cell column. - CJK/East-Asian wide and emoji rejected only because their length differs, but the boundary was shape-shaped, not intent-shaped. User-visible bug from the original report: Example: eê noiói nge neène -> the bypass committed the IME preview char before the diacritic replaced it, leaving doubled letters on screen. Fix: gate fast-echo on pure printable ASCII (0x20-0x7e). The performance-critical English typing path is unchanged; everything else goes through the normal Ink render path so layout stays accurate. Also extracts the shape preconditions as pure exported helpers (canFastAppendShape / canFastBackspaceShape) so the regression matrix is testable without spinning up a TextInput. Tests: ui-tui/src/__tests__/textInputFastEcho.test.ts adds 20 cases covering ASCII still works, Vietnamese precomposed + NFD, CJK, emoji, NBSP / Latin-1, ANSI / control bytes, multi-line, and end-of-line preconditions. Verified RED on the previous guard (11 of 20 fail) and GREEN on the new guard. Refs: NousResearch#5221, NousResearch#7443, NousResearch#17602, NousResearch#17603 (similar wide-char rendering bugs). * docs(tui): clarify Vietnamese char terminology in regression comment Address Copilot review: 'single byte width' implied UTF-8 byte semantics, but the relevant property is JS code units (`text.length === 1`) and display width (`stringWidth === 1`). Reworded to match.
🚨 CRITICAL Supply Chain Risk DetectedThis PR contains a pattern that has been used in real supply chain attacks. A maintainer must review the flagged code carefully before merging. 🚨 CRITICAL: Install-hook file added or modifiedThese files can execute code during package installation or interpreter startup. Files: Scanner only fires on high-signal indicators: .pth files, base64+exec/eval combos, subprocess with encoded commands, or install-hook files. Low-signal warnings were removed intentionally — if you're seeing this comment, the finding is worth inspecting. |
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-attribute |
14 |
unresolved-import |
11 |
unsupported-operator |
9 |
invalid-assignment |
3 |
invalid-argument-type |
3 |
unresolved-reference |
1 |
unused-type-ignore-comment |
1 |
call-non-callable |
1 |
invalid-method-override |
1 |
First entries
tests/tools/test_process_registry.py:902: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["watch pattern \"ERROR\""]` and `str | None`
tests/acp/test_entry.py:6: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
gateway/platforms/discord.py:3677: [unresolved-attribute] unresolved-attribute: Attribute `Object` is not defined on `None` in union `Unknown | None`
cli.py:1489: [unresolved-attribute] unresolved-attribute: Unresolved attribute `_hermes_light_mode_hook_installed` on type `<class 'SkinConfig'>`.
tests/tools/test_process_registry.py:889: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["Output:\ndone]"]` and `str | None`
gateway/platforms/tlon.py:2045: [unresolved-attribute] unresolved-attribute: Attribute `poke` is not defined on `None` in union `TlonSSEClient | None`
tests/gateway/test_tlon_adapter.py:376: [unresolved-attribute] unresolved-attribute: Object of type `bound method TlonAdapter.handle_message(event) -> CoroutineType[Any, Any, None]` has no attribute `assert_not_awaited`
acp_adapter/auth.py:40: [unresolved-import] unresolved-import: Cannot resolve imported module `acp.schema`
tests/tools/test_process_registry.py:925: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["[IMPORTANT: Watch disabled for proc_xyz"]` and `str | None`
tests/gateway/test_tlon_adapter.py:4: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/gateway/test_simplex_plugin.py:313: [unresolved-import] unresolved-import: Cannot resolve imported module `websockets.client`
tests/gateway/test_tlon_adapter.py:379: [unresolved-attribute] unresolved-attribute: Object of type `bound method TlonAdapter.send(chat_id: str, content: str, reply_to: str | None = None, metadata: dict[str, Any] | None = None) -> CoroutineType[Any, Any, SendResult]` has no attribute `assert_awaited_once`
tests/tools/test_process_registry.py:903: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["Matched output:\nERROR: disk full"]` and `str | None`
tests/gateway/test_tlon_adapter.py:359: [invalid-assignment] invalid-assignment: Object of type `AsyncMock` is not assignable to attribute `handle_message` of type `def handle_message(self, event) -> CoroutineType[Any, Any, None]`
gateway/platforms/tlon.py:920: [unresolved-attribute] unresolved-attribute: Attribute `delete` is not defined on `None` in union `Any | None`
tests/tools/test_process_registry.py:886: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["[IMPORTANT: Background process proc_abc completed"]` and `str | None`
tests/gateway/test_tlon_adapter.py:315: [unresolved-attribute] unresolved-attribute: Object of type `bound method TlonAdapter.handle_message(event) -> CoroutineType[Any, Any, None]` has no attribute `assert_awaited_once`
gateway/platforms/tlon.py:974: [unresolved-reference] unresolved-reference: Name `aiohttp` used when not defined
gateway/platforms/yuanbao.py:2621: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `MessageType`, found `Literal[MessageType.DOCUMENT] | Any | None`
gateway/platforms/tlon.py:871: [unresolved-attribute] unresolved-attribute: Attribute `put` is not defined on `None` in union `Any | None`
cli.py:13400: [unresolved-import] unresolved-import: Cannot resolve imported module `prompt_toolkit.renderer`
gateway/platforms/tlon.py:1819: [unresolved-attribute] unresolved-attribute: Attribute `scry` is not defined on `None` in union `TlonSSEClient | None`
tests/gateway/test_tlon_adapter.py:360: [invalid-assignment] invalid-assignment: Object of type `AsyncMock` is not assignable to attribute `send` of type `def send(self, chat_id: str, content: str, reply_to: str | None = None, metadata: dict[str, Any] | None = None) -> CoroutineType[Any, Any, SendResult]`
gateway/platforms/tlon_media.py:195: [unresolved-import] unresolved-import: Cannot resolve imported module `aiohttp`
tests/tools/test_process_registry.py:888: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["Command: sleep 5"]` and `str | None`
... and 19 more
✅ Fixed issues (107):
| Rule | Count |
|---|---|
unresolved-import |
62 |
invalid-assignment |
13 |
unresolved-attribute |
13 |
invalid-argument-type |
12 |
not-subscriptable |
2 |
invalid-parameter-default |
2 |
unsupported-operator |
1 |
invalid-return-type |
1 |
unresolved-reference |
1 |
First entries
tools/rl_training_tool.py:973: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["history"]` and value of type `list[dict[Unknown, Unknown]]` on object of type `dict[str, str]`
tools/rl_training_tool.py:1130: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> dict[str, str | int | float], (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[dict[str, str | int | float]]]` cannot be called with key of type `Literal["max_num_workers"]` on object of type `list[dict[str, str | int | float]]`
tools/rl_training_tool.py:1276: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["accuracy"]` and value of type `int | float` on object of type `dict[str, str | list[Unknown] | int]`
tools/rl_training_tool.py:693: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `list[dict[str, str | int | float]]`, `bool` in union `dict[str, str | int | float] | list[dict[str, str | int | float]] | bool`
hermes_cli/tools_config.py:2469: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `dict[Unknown, Unknown]` and `str | list[dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | dict[str, str | list[dict[str, str]]]] | list[dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | ... omitted 4 union elements`
tools/rl_training_tool.py:337: [unresolved-attribute] unresolved-attribute: Unresolved attribute `api_log_file` on type `RunState`
tools/rl_training_tool.py:774: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["wandb_project"]` and value of type `Any` on object of type `list[dict[str, str | int | float]]`
tools/rl_training_tool.py:1129: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `Overload[(i: SupportsIndex, /) -> dict[str, str | int | float], (s: slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> list[dict[str, str | int | float]]]` cannot be called with key of type `Literal["max_token_length"]` on object of type `list[dict[str, str | int | float]]`
tools/rl_training_tool.py:1131: [not-subscriptable] not-subscriptable: Cannot subscript object of type `bool` with no `__getitem__` method
tests/hermes_cli/test_tools_config.py:534: [invalid-argument-type] invalid-argument-type: Argument to function `_configure_provider` is incorrect: Expected `dict[Unknown, Unknown]`, found `str | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | ... omitted 3 union elements`
tests/tools/test_managed_server_tool_support.py:22: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib`
environments/web_research_env.py:67: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.base`
environments/hermes_base_env.py:343: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.openai_server`
environments/web_research_env.py:68: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
environments/agentic_opd_env.py:83: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.base`
environments/benchmarks/yc_bench/yc_bench_env.py:62: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.base`
environments/benchmarks/terminalbench_2/terminalbench2_env.py:57: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.base`
environments/benchmarks/yc_bench/yc_bench_env.py:63: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
environments/hermes_swe_env/hermes_swe_env.py:45: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.base`
tests/hermes_cli/test_tools_config.py:720: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `str`, `int` in union `str | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[Unknown]] | ... omitted 3 union elements`
environments/benchmarks/terminalbench_2/terminalbench2_env.py:58: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
environments/hermes_swe_env/hermes_swe_env.py:46: [unresolved-import] unresolved-import: Cannot resolve imported module `atroposlib.envs.server_handling.server_manager`
environments/web_research_env.py:51: [unresolved-import] unresolved-import: Cannot resolve imported module `pydantic`
tools/rl_training_tool.py:775: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["wandb_run_name"]` and value of type `str` on object of type `list[dict[str, str | int | float]]`
tests/run_agent/test_agent_loop_tool_calling.py:29: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
... and 82 more
Unchanged: 4299 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
Three issues flagged by the Copilot review on this PR: 1. Double JSON emit on stage failure (Copilot #1, #2). When -Stage <name> ran a worker that threw, Invoke-Stage's finally emitted a JSON result frame AND the entry-point catch emitted a second error frame -- producing two concatenated JSON objects on stdout and breaking the one-line-per-invocation contract that drivers parse against. Same issue applied to -Json mode on a full install (every stage's finally plus a final error frame missing duration_ms/skipped). Fix: Invoke-Stage's finally now sets $script:_StageEmittedErrorFrame when it emits a failure frame; the entry-point catch checks the flag and skips its own emit, still exit 1. 2. $prevEAP uninitialized on early try-block throw (Copilot #3). In Install-Uv, Test-Python, Test-Node's winget fallback, _Run-NpmInstall, and the playwright block, '$prevEAP = $ErrorActionPreference' lived as the first statement INSIDE the try. If anything between 'try {' and that line threw (Write-Info on an unusual host, the npx-finding loop, etc.), the catch's 'if ($prevEAP) { ... }' restore was a no-op and EAP could remain relaxed. Fix: hoist '$prevEAP = $ErrorActionPreference' to the line immediately before 'try {' in all five sites. Catch's restore is now always meaningful regardless of where in the try the throw originated. No change to Invoke-Stage's success path or to the four lint-clean EAP sites (Test-Node was the only winget-related catch). All 19 metadata smoke tests still pass.
Split from NousResearch#26300.
Scope:
Validation:
scripts/run_tests.sh tests/gateway/test_tlon_adapter.py tests/tools/test_send_message_tool.pyResult: 127 passed.