Skip to content

Releases: voidborne-d/hermit-agent

v0.1.50 — chrome-launcher: deterministic ports + IPv4 lock + collision detect

15 May 13:34

Choose a tag to compare

Three-layer fix for the silent CDP-port-share failure

Root cause

find_free_port had a TOCTOU race: between lsof -i :PORT returning empty (during a brief Chrome restart) and the next agent binding PORT, a different agent could grab the same port. And Chrome's macOS behavior on bind conflict is to silently fall back to [::1]:PORT instead of failing — so two agents end up "running on port 19900," one IPv4 and one IPv6, with chrome.json recording the requested port (not what Chrome actually bound to). Tools connecting via 127.0.0.1:19900 then talk to whichever agent won the IPv4 race.

Layer 1 — Force IPv4-only

--remote-debugging-address=127.0.0.1

Chrome no longer silently rebinds to IPv6 on conflict; it hard-fails and the launcher knows.

Layer 2 — Deterministic per-agent port

deterministic_port() {
  local offset=$(printf '%s' "$1" | cksum | awk '{print $1 % 100}')
  echo $((19900 + offset))
}

Same agent name always gets the same port. The race window doesn't exist for the common case. Falls back to find_free_port only on real collision (hash collision or external process).

find_free_port now also cross-checks sibling agents' chrome.json files via sibling_owns_port, so a port owned by a live sibling whose Chrome is between restarts is still skipped.

Layer 3 — Detection

multi-agent-status-report.sh scans every sibling's chrome.json and emits:

🟥 chrome-cdp · collision · port 19900: agent-a ↔ agent-b

if two live agents claim the same port. Catches anything the first two layers miss.

Files

  • template/scripts/chrome-launcher.shdeterministic_port, sibling_owns_port, --remote-debugging-address=127.0.0.1
  • template/scripts/multi-agent-status-report.sh — collision check
  • test/smoke.mjs — +4 → 115/115

v0.1.49 — block AskUserQuestion (TUI hang fix)

15 May 03:27

Choose a tag to compare

Stop AskUserQuestion from silently hanging headless agents

Claude Code's built-in AskUserQuestion tool renders a TUI modal directly to stdin/stdout. In hermit-agent's headless tmux + Telegram setup the user is on Telegram and never sees the modal, so the agent hangs forever waiting on stdin. The Telegram channel process stays alive — looks like the agent has gone silent, but the tool just never reaches it.

Fix: PreToolUse hook + AGENTS.md rule

scripts/hook-block-askuserquestion.sh denies any AskUserQuestion call and surfaces a permissionDecisionReason back to the model:

Send a Telegram reply via mcp__plugin_telegram_telegram__reply with the options as numbered lines, end the turn, and let the user answer in the next inbound message.

The model adapts mid-turn — no human intervention needed. AGENTS.md "Telegram Replies — Hard Rules" gains a third rule explicitly forbidding the tool, so the model avoids it in the first place.

Origin incident

2026-05-15 master-skill called AskUserQuestion to pick between two skill-regeneration strategies, sat silent in tmux for hours until the operator manually inspected the pane. Bot was alive (PID 30147), Telegram channel was healthy — the tool just renders to a place no one was looking.

Files

  • template/scripts/hook-block-askuserquestion.sh
  • template/.claude/settings.local.json.tmpl — wires AskUserQuestion matcher
  • template/AGENTS.md — third hard rule under Telegram Replies
  • test/smoke.mjs — +3 assertions → 111/111

v0.1.48 — status-reporter accuracy fixes

15 May 03:07

Choose a tag to compare

Three digest bugs fixed

Surfaced after a kernel-panic reboot exposed them all at once.

1. Self-report noise

The agent's own status-reporter LaunchAgent was being picked up by the auto-discover loop, always reading "ran 0s ago" — useless signal (by definition, if you're reading the digest, the reporter just fired). Now skipped via SELF_STATUS_REPORTER_LABEL.

2. launchctl runs counter resets on every load

After reboot (or plist edit / bootstrap), every task that had been running for weeks suddenly reads "🟨 never ran" because the runs counter is back at 0. Fix: check_interval_agent now accepts an optional log_path and falls back to log mtime when runs == 0 but the log exists. Auto-discover loop derives <agent>/.claude/state/<task>.log from the label, so any plist following the standard log layout gets this for free.

3. Calendar tasks were silently invisible

reap-dead-sessions (added in 0.1.47) uses StartCalendarInterval, so the auto-discover loop's [ -z "$interval" ] && continue filtered it out — never appeared in the digest. New check_cron_mtime helper plus an explicit hookup that fires when the per-agent reaper plist is installed.

Files

  • template/scripts/multi-agent-status-report.sh
  • test/smoke.mjs — +3 → 108/108

v0.1.47 — dead-session reaper

15 May 02:55

Choose a tag to compare

Daily Trash sweep of stale Claude session JSONLs

Each Claude Code session writes a JSONL under ~/.claude/projects/<encoded-AGENTS_ROOT>-<agent>/ and these files accumulate fast. New master-only LaunchAgent (macOS) / systemd timer (Linux) runs daily at 04:10 and ships JSONLs (plus their sidecar subdirs) to the OS recycle bin once they're no longer needed.

Reap criteria — ALL three must hold

  1. session_id<agent>/.claude/state/session-status.json .session_id (active)
  2. session_id<agent>/.claude/state/paused.json .session_id (hibernated wake target)
  3. JSONL mtime older than REAP_AGE_DAYS (default 3 — buffer for claude --resume)

Protection check runs before the mtime check, so active sessions and hibernated wake targets are safe regardless of age.

Recycle bin, not rm

  • macOS: /usr/bin/trash
  • Linux: gio trash

Without one of those the script refuses to run rather than mv-shuffle silently.

Manual ops

scripts/reap-dead-sessions.sh --dry-run        # preview
scripts/reap-dead-sessions.sh --age-days 14    # wider buffer
tail -f .claude/state/reap-dead-sessions.log

Origin incident

Backlog of ~833 stale JSONLs / 916 MB across 8 sibling agents in the local asst hub before this landed. After daily sweeps the working set should stay flat.

Files

  • template/scripts/reap-dead-sessions.sh
  • template/launchd/reap-dead-sessions.plist.tmpl
  • template/systemd/reap-dead-sessions.{service,timer}.tmpl
  • bin/create-hermit-agent.jsinstallDeadSessionReaper() (master-only)
  • template/AGENTS.md — new "Dead-session reaper" section
  • test/smoke.mjs — +7 assertions → 105/105

v0.1.46: fleet hibernation (idle-hibernator + wake-poller)

14 May 19:34

Choose a tag to compare

Fleet-wide agent hibernation. Long-idle hermits get paused (claude + bun + chrome killed, session JSONL preserved); inbound Telegram traffic auto-wakes them within 60s.

What's new

  • scripts/hibernate-agent.sh — pauses an idle agent: saves session_id + tmux remain-on-exit + kills claude/bun/chrome, leaves paused.json for the wake side
  • scripts/wake-agent.shtmux respawn-pane with claude --resume <session_id>, auto-dismisses the "Resume from summary" modal that fires for old/large sessions (picks summary; full resume can cost several USD per wake)
  • scripts/idle-hibernator.sh — LaunchAgent / systemd timer (10 min); hibernates siblings idle beyond IDLE_THRESHOLD_SEC (default 48h). Self-excludes via HIBERNATOR_SELF
  • scripts/wake-poller.sh — LaunchAgent / systemd timer (60s); peeks each paused agent's Telegram bot queue with getUpdates (no ACK), wakes on any pending update. Fast-path exits in <100ms when nothing is hibernated
  • multi-agent-status-report.sh — distinguishes hibernated (💤) from down () by checking paused.json
  • template/AGENTS.md — new Hibernation section with state machine + manual ops
  • installHibernationSystem in create-hermit-agent — auto-installs on macOS launchd + Linux systemd-user for master role only (workers skip; status-reporter convention is mirrored)

States

  • agent.pid present, paused.json absent → alive
  • agent.pid absent, paused.json present → hibernated (💤)
  • both absent → down (⚫)

Tested

  • 98/98 smoke checks (+13 hibernation-specific)
  • End-to-end on the port-source workspace (asst): hibernate → status-reporter shows 💤 → wake → resume-modal auto-dismiss → back to idle. The wake-poller fast path was exercised live; the "pending update triggers wake" path is straightforward but only fires with a real Telegram inbound (verifies organically on next use)

v0.1.42: codex-hermit timeout fix — process-group kill + 30 min ceiling

09 May 10:44

Choose a tag to compare

Fixes a wedge condition in the codex-flavor bridge daemon discovered when a 30-minute README-generation turn hung the daemon for 26+ minutes with no error reaching the user.

Root cause

`subprocess.run(timeout=600)` only signals the immediate child. `codex exec` spawns helpers underneath, so SIGKILL to the top process leaves the helpers running. `subprocess.run`'s internal wait() never returns; the daemon stays stuck inside the catch-the-timeout path. The orphan codex process gets reparented to PPID=1 and keeps talking to the OpenAI backend silently.

Fixes

  1. `run_codex` now uses `Popen + start_new_session=True` so the daemon owns a fresh process group. On `TimeoutExpired` we `os.killpg(SIGKILL)` to take the whole tree, with a bounded 10s reap. The exception is re-raised so `handle()` can message the user.
  2. `CODEX_TIMEOUT_SEC` bumped from 600s (10 min) to 1800s (30 min). For prompts that legitimately need codex to write files, run git, install deps, etc., 10 min was too tight. Cron still wraps with `with-timeout.sh 1200` (20 min) per AGENTS.md Cron Safety.
  3. Timeout reply text now tells the user the thread is preserved — they can send another message to continue from where codex got stuck.

Smoke

88/88 (one new assertion verifies `Popen` + `start_new_session` + `killpg` + 1800s ceiling).

Verified

Killed the orphan codex on the dev workspace (pid 40702, etime 26 min) and restarted the daemon. Next turn comes back clean. The new `Popen` + pgroup-kill path will catch this case automatically going forward.

v0.1.41: codex-hermit gets full sandbox + approval bypass

09 May 08:55

Choose a tag to compare

Fixes a silent capability gap on codex flavor: agents could read but not write or hit the network.

What was wrong

Codex CLI defaults --sandbox to read-only and --ask-for-approval to a mode that prompts for any write. The bridge daemon was inheriting both defaults, so codex hermits could load AGENTS.md, recall persona, and reply — but they couldn't write memory/YYYY-MM-DD.md, run scripts, install packages, generate images into the workspace, or fetch URLs. Symptom: agents repeatedly explained "current environment is read-only sandbox" and gave up halfway through any task that required a side effect.

Fix

scripts/tg-bridge.py and scripts/run-cron.sh now pass --dangerously-bypass-approvals-and-sandbox (alias --yolo) to every codex exec invocation. This is the codex-side equivalent of claude flavor's --dangerously-skip-permissions:

  • --sandbox=danger-full-access (full filesystem + network)
  • --ask-for-approval=never (no prompts mid-turn)

Same threat model as claude flavor — the user owns the machine and runs the agent as themselves; the agent inherits the user's authority.

Smoke

87/87 (one new assertion verifies both tg-bridge.py and run-cron.sh carry the flag).

v0.1.40: README documents Codex CLI host (no code change)

08 May 13:24

Choose a tag to compare

README-only update: makes the Codex CLI host (--host codex, shipped in v0.1.38) first-class on the README instead of buried in release notes.

Changes

Both README.md and README.zh-CN.md:

  • Hero tagline names both hosts: Claude Code (default) and OpenAI Codex CLI
  • Two host badges instead of one
  • "Why a hermit crab?" credits table adds Codex CLI row
  • 30-second quickstart shows both npx create-hermit-agent (claude) and npx create-hermit-agent <name> --host codex (codex)
  • Feature matrix gains a top "Host choice" row; each capability row notes how claude vs codex flavor implements it
  • New "Host choice: Claude Code or Codex" section — side-by-side comparison table covering CLI / cost / Telegram bridge / hooks / slash commands / skills / MCP / cron / state, plus when-to-pick-which guidance
  • Install section: dedicated "codex flavor" prereq block (codex CLI, python3, no bun), scaffold + start commands shown for both flavors

No code change

This is a docs-only release. v0.1.39 already shipped the actual sendPhoto fix; v0.1.40 just makes sure people landing on the npm page or GitHub repo can see the codex story up front.

v0.1.39: codex-hermit relays generated images via sendPhoto

08 May 13:17

Choose a tag to compare

Fixes a silent failure for codex-flavor hermits: when Codex's built-in image_gen tool produced a png in ~/.codex/generated_images/<thread>/, the bridge daemon only posted the text reply — the user saw "image generated successfully" but never received the actual file. Looked like a generation error, was actually a missing transport.

Fix

  • snapshot_generated() walks ~/.codex/generated_images/ recursively and returns the set of png/jpg/webp/gif files. Called before and after each codex exec turn; the difference is the artifact set this turn produced.
  • send_photo() builds multipart/form-data manually (stdlib only, no requests dep) and POSTs to /sendPhoto. The bot token stays in env, never on argv — token-safety hard rule preserved.
  • handle() now dispatches new images via send_photo with an upload_photo chat_action pulse first so users see "is uploading photo…" before the bytes arrive. Text reply follows.

Compatibility

No breaking changes. Pure addition to the codex flavor's bridge daemon. Claude flavor untouched.

Smoke

86/86 (added 1 assertion covering send_photo + snapshot_generated + upload_photo wiring).

Verified end-to-end

Tested on the dev workspace /Users/mac/claudeclaw/codex-demo with a live ChatGPT image-generation request before porting into the template.

v0.1.38: --host codex flag (run hermit on ChatGPT subscription)

08 May 13:11

Choose a tag to compare

A new flag for npx create-hermit-agent lets users opt into a Codex CLI flavor instead of Claude Code. The codex hermit uses the user's ChatGPT subscription (via the codex CLI) instead of Claude API spend, while keeping the same hermit pattern: persona files (SOUL/IDENTITY/USER/AGENTS/TOOLS/MEMORY), daily logs, hooks, cron, Telegram bridge.

Usage

# Default (unchanged):
npx create-hermit-agent my-agent --bot-token <T> --user-id <CID> --persona "..." --yes

# Codex flavor (new):
npx create-hermit-agent my-agent --host codex --bot-token <T> --user-id <CID> --persona "..." --yes

Default stays --host claude. Codex is opt-in only.

What's different in --host codex

  • No Claude Code dependency. Prereqs are codex CLI + python3 + tmux + curl + jq. No bun, no claude.
  • Python Telegram bridge (scripts/tg-bridge.py) replaces the --channels plugin:telegram@claude-plugins-official model. Polls getUpdates, runs codex exec [resume <thread_id>] per message, posts captured --output-last-message back via sendMessage.
  • Hooks in scripts/hooks/{boot,pre-run,post-run}.sh recreate the equivalents of Claude Code's SessionStart / UserPromptSubmit / Stop. pre-run does image safety; post-run strips markdown + writes daily log; boot fires FIRST_RUN.md welcome on first launch.
  • Admin commands intercepted by daemon: /help, /status, /reset (new codex thread), /restart (full bridge respawn).
  • Cron uses scripts/run-cron.sh wrapping codex exec in with-timeout.sh 1200. Pattern mirrors the claude-side cron.
  • AGENTS.md auto-loaded by Codex (native behavior — no need for CLAUDE.md entry file).

Provision skills

  • provision-agent: passes --host codex only when the user explicitly asks for it. Default stays claude.
  • provision-clone: clones inherit parent's host. Codex-flavored --clone-of is not yet supported (the symlink layout assumes Claude Code's .claude/ tree); fall back to a fresh provision-agent --host codex.

Why

asst (a hermit) was burning ~$350/day on Claude API to run 8 agents. ChatGPT Pro 20x ($200/mo) gives 300-1600 messages per 5h window — with the math, the codex flavor can cover most hermit workloads at ~50× lower cost. Default stays claude because (a) Anthropic has been the upstream investment, (b) the official telegram plugin's --channels integration is genuinely nicer than a polling daemon, and (c) some hermit workflows (cron -p, MCP plugin marketplace) lean on Claude Code features that codex doesn't have direct equivalents for.

Smoke

85/85 (68 claude flavor + 17 codex flavor assertions). Codex flavor verified end-to-end on a separate workspace (codex-demo) before extraction into template-codex/.

Notes

  • The Codex bridge daemon is intentionally simple (~250 lines Python). Future iterations may grow it.
  • ~/.codex/sessions/ accumulates jsonl per thread; users on long conversations can /reset for fresh threads to avoid token bloat.
  • ChatGPT subscription rate-limits are per-window (5h); users running heavy crons + chat might want to monitor codex login status and the daily ccusage breakdown.