Skip to content

Releases: amariichi/MinimumHeadroom

v1.18.2 — AtomS3R: auto-patch WebServer to remove 5s audio POST tail

24 May 23:24
4a07b64

Choose a tag to compare

Patch release

Removes a fixed ~5 second tail from every audio POST to AtomS3R, so chunked TTS playback plays smoothly instead of choppy.

The bug

The Arduino ESP32 `WebServer` raw-upload loop in
`framework-arduinoespressif32/libraries/WebServer/src/Parsing.cpp` reads
`HTTP_RAW_BUFLEN` (1436) bytes per iteration regardless of how many body bytes remain.
The final iteration almost always asks for more than is left, so
`WiFiClient::readBytes` blocks waiting for bytes that will never arrive until
`HTTP_MAX_SEND_WAIT` (5000 ms) elapses. This adds a fixed ~5 s tail to every
audio POST, so each AtomS3R chunk took ~6 s end-to-end while face-app produced
chunks every ~3 s — the bridge queue grew, and the listener heard the audio
break up.

The fix

  • New `firmware/atoms3r-headroom/scripts/apply_webserver_patch.py` is a PlatformIO pre-build hook (registered as `extra_scripts` in `platformio.ini`). It caps each `readBytes()` request to the actual remaining body bytes, so the last read returns immediately.
  • The hook is idempotent (self-marks the patched line with a `PATCH(minimum-headroom):` comment) and fails loudly if the upstream library no longer matches the expected snippet, so a future framework upgrade cannot silently regress the fix.
  • The equivalent unified diff is checked in at `firmware/atoms3r-headroom/patches/webserver_raw_read_cap.patch` for transparency.
  • README documents the rationale, manual application, and measured effect in both English and Japanese.

Measured effect

Payload Before After Speedup
30 KB 5.17 s 0.18 s 28×
60 KB 5.23 s 0.37 s 14×
120 KB 5.38 s 0.48 s 11×
180 KB 5.58 s 0.68 s

`RAW_START` → `handleAudio` gap reduced from 5070 ms to 78–430 ms (measured via temporary serial instrumentation, now removed).

Compatibility

  • No protocol change.
  • No firmware behavior change beyond the speedup; the audio queue path itself is unchanged.
  • No host-side (face-app / tts-worker / mcp-server) behavior change.
  • The patch modifies a file under `~/.platformio/packages/` (system library). The first `pio run` after pulling this release prints `[webserver-patch] applied: …` once; subsequent builds print `[webserver-patch] already applied: …`. The patch is reapplied automatically after a framework upgrade.

🤖 Generated with Claude Code

v1.18.1 — TTS: strip JA full-stops before Kokoro misaki phonemizer

24 May 11:15
0ea6511

Choose a tag to compare

Patch release

Removes an audible artifact at the end of Japanese sentences spoken through the default Kokoro engine.

The bug

Misaki's pyopenjtalk-backed Japanese G2P maps the JA full-stop 「。」 (and its fullwidth ASCII twin 「.」) to an actual phoneme rather than silence, so Kokoro rendered chunk endings as a short "ye"-like sound. Confirmed on hardware: あ。 produced plus an extra sound, and あ。。。。。 produced plus five repeated artifacts (one per period), proving each was reaching misaki and being phonemized.

The fix

  • New module tts-worker/src/tts_worker/kokoro_text.py exposes strip_japanese_silent_punctuation, which removes 「。」 and 「.」 runs.
  • KokoroEngine._to_ja_phonemes strips the input before calling misaki and returns an empty string if the chunk becomes empty after stripping.
  • KokoroEngine.synthesize_chunks now skips chunks that come back empty (this also prevents the pre-existing misaki ja g2p returned empty phoneme output error path on punctuation-only chunks).
  • Other JA punctuation (「、」「!」「?」「・」「…」) is intentionally left in place — those either drive prosodic pausing or have not been observed to produce artifacts.
  • Shared text normalization (tts_worker.shared_text) is untouched, so the existing "preserve JA punctuation" contract there still holds; the strip is engine-local to Kokoro, parallel to how qwen3_text.py houses Qwen3-specific text prep.

Tests

tts-worker/tests/test_kokoro_text.py adds 8 unit tests: trailing 「。」, repeated 「。」 runs, internal + trailing, fullwidth 「.」, preservation of 「、」「!」「?」「・」「…」, punctuation-only input → empty, empty input passthrough, ASCII passthrough. All tts_worker tests pass (25/25).

Compatibility

  • No protocol change.
  • No firmware change.
  • No API change.
  • Affects only the Kokoro TTS path (default engine). Qwen3 path is unchanged.

Verification

  • CI green on PR #59 (28 s) and on the merged main commit.
  • Live hardware test on AtomS3R after a fresh Kokoro restart: あ。 now sounds like only, no trailing artifact.

🤖 Generated with Claude Code

v1.18.0 — Atom TTS prefetch+FIFO, deferred idle TTS, Tailscale router guide

24 May 10:14
f60ca6c

Choose a tag to compare

Minor release

Three independently useful changes land together in this release.

1. Atom TTS prefetch + FIFO (primary feature)

Long multi-sentence answers now play with much shorter inter-chunk gaps on every remote audio sink (PC browser tab, mobile browser, and AtomS3R). Three coordinated changes:

  • TTS worker (tts-worker/src/tts_worker/__main__.py): new MH_TTS_REMOTE_PREFETCH_MS (default 900) returns play_stop early for browser-only audio targets so the next chunk's synthesis can overlap the current chunk's playback. Gated to MH_AUDIO_TARGET=browser because the local host speaker uses the worker's own audio clock and must not be cut short.
  • face-app browser audio (face-app/public/app.js): same-session chunks now queue sequentially instead of replacing the active source. Interrupt, drop, and error paths still stop and clear the queue.
  • AtomS3R firmware (firmware/atoms3r-headroom/src/headroom_audio.{cpp,h}, headroom_transport.{cpp,h}): bounded one-pending-WAV FIFO. stop(), stopForRecording(), PTT, and interrupt all flush both the active and queued audio.

2. Bridge generation watermark fix

Hardware testing surfaced a regression where AtomS3R cut off mid-utterance and then spoke an idle notification. Root cause: scripts/atoms3r-http-bridge.mjs advanced its generation watermark on every relayed WebSocket payload, so a deferred-idle (introduced earlier in this release) tts_state at generation N+1 caused the in-flight gen=N utterance's tail chunks to be dropped as "stale". Fix: scope observeGeneration to tts_audio / tts_audio_ref only — state, event, and mouth payloads no longer advance the watermark. Verified end-to-end on hardware with both Kokoro and Qwen3 TTS engines.

3. Deferred idle TTS + Tailscale travel router guide

Already on the branch from earlier work this cycle:

  • Defer idle TTS notifications until speech drains — the hook bridge's idle_after_response notification now queues behind the in-flight utterance (with the bridge fix above, this is no longer just a controller-side guarantee).
  • Tailscale travel router setup guide (doc/guides/tailscale-travel-router-setup.md) — how to place AtomS3R behind a GL.iNet travel router and reach it from your PC via Tailscale subnet routing.

Operational note: prefer --audio-target browser for remote sinks

The new prefetch path only engages when MH_AUDIO_TARGET=browser. In both mode the worker keeps waiting through the full local-playback duration, so remote sinks fall back to the older synthesize-then-send-then-play pacing. README, doc/guides/operator-stack.md (EN + JA), and examples/rmh-voice-mode/README.md now recommend browser for PC browser tabs, mobile, and AtomS3R, with a short explanation. scripts/restart-operator-stack-in-place.sh default changed from both to browser to match scripts/run-operator-stack.sh's existing default.

Verification

  • Full npm test green (375 tests), Node syntax checks, Python compile checks.
  • PlatformIO firmware build green: RAM 51,380 bytes (15.7%) / Flash 1,231,493 bytes (36.8%).
  • Live end-to-end test on AtomS3R hardware: gap reduction confirmed audibly with both Kokoro (default) and Qwen3 (--profile qwen3) TTS engines.
  • CI green on PR #58 (29 s) and on the merged main commit.

🤖 Generated with Claude Code

v1.17.4 — docs: add AtomS3R + atoms3r-http-bridge to architecture diagrams

23 May 12:54
2c3926a

Choose a tag to compare

Patch release

Documentation-only update that closes a long-standing gap: AtomS3R has been a first-class hardware face since v1.17.0 but the architecture diagrams still only depicted the browser-based Frontend UI. The README prose described AtomS3R correctly, the diagrams did not — confusing for anyone trying to understand the runtime topology.

What changed

  • High-level flow (doc/diagrams/high-level-flow.{mmd,svg,png} + inline README blocks in both languages): adds an AtomS3R Device (2D face LCD + Echo speaker + PTT mic) and atoms3r-http-bridge node. AtomS3R uplinks (mic WAV → /api/operator/asr, operator_response/api/operator/response) go direct to face-app over HTTP, in parallel with the Frontend UI's uplinks. Downlinks (face / TTS payloads + audio) fan out through the bridge to AtomS3R's /api/headroom/{payload,audio} endpoints.
  • Sequence timeline (doc/diagrams/sequence-timeline.{mmd,svg,png} + inline README blocks): adds Input path D — AtomS3R PTT alongside paths A/B/C, and adds the bridge fan-out for event/say/state payloads and tts_audio/tts_mouth in the output section. New participants: ATOM and ATOMBR.
  • README: keeps English and Japanese inline mermaid blocks in sync with the .mmd sources (4 blocks total).
  • PNG/SVG fallbacks re-rendered at --width 2400 --scale 2 so the static images stay legible at higher zoom levels.

What did not change

The "3D face" references for the browser-side Frontend UI are intentionally untouched — that face really is 3D (Three.js-rendered head, eye/eyebrow/mouth/head animation). Only the AtomS3R LCD face is labeled as 2D, which matches the actual firmware behavior.

No code change, no API change, no runtime behavior change.

Verification

  • Diagrams re-rendered via npx @mermaid-js/mermaid-cli@11.12.0 with --puppeteerConfigFile (no-sandbox) and verified to contain AtomS3R and atoms3r-http-bridge text nodes.
  • CI green (33 s).

🤖 Generated with Claude Code

v1.17.3 — TTS: scope hai-filler to Qwen3 + fullwidth-ify JA halfwidth digits

23 May 12:35
dd0dec1

Choose a tag to compare

Patch release

Two independent cleanups to the shared TTS text pipeline that affect how Kokoro (the default engine) reads Japanese utterances. Verified end-to-end through the live Kokoro pipeline before tagging.

What changed

  • はい、 leading filler is now Qwen3-only. It was added to the shared normalizer as a Mandarin-drift countermeasure for Qwen3, but Kokoro+misaki has no such drift and was getting an unwanted "Yes," before every sentence that opened with a halfwidth ASCII or numeric token (e.g. execplanを作成しました。 was rendering as はい、execplanを…). apply_japanese_leading_numeric_filler and apply_japanese_leading_unknown_ascii_filler now live in the Japanese branch of prepare_qwen3_text. Qwen3 behavior is unchanged; Kokoro stops getting the prefix.
  • Halfwidth digits inside Japanese text are now fullwidth-ified. Misaki's English G2P used to fire on halfwidth digits embedded in Japanese, so 今日は5月23日です。 rendered as ファイブ月とウェンティースリー日です. normalize_japanese_tts_text now translates 0-9 → 0-9 so misaki keeps the digits on the Japanese G2P path (今日は5月23日です。). Pure-English utterances are routed through normalize_english_tts_text instead, so The build runs at 5:30 on port 8080. still reads as English.

Verified end-to-end through the live Kokoro pipeline

Input Reading
今日は5月23日です。 Japanese date reading
execplanを作成しました。 No spurious leading はい、
23日までに完了します。 Japanese digit reading, no filler
The build runs at 5:30 on port 8080. English reading preserved (untouched by JA pipeline)
現在のバージョンは1.2.3です。 Japanese reading of 1.2.3

Tests

  • shared_text: 6 new cases (date phrase, dotted version, mixed sentence with thousands separator, pure-English routing, decoration). 3 existing cases updated.
  • qwen3_text: 3 new cases asserting Qwen3-Japanese-mode still receives the filler so the existing Mandarin-drift mitigation is preserved.
  • Python suite: 42/42 pass. Node suite: 372/372 pass.

Notes

  • Mixed JA/English sentences such as API 2.0 を試した are now read as API 2.0 を試した because the text contains JA script and is therefore JA-routed. Misaki is already in Japanese G2P mode in that case, so the fullwidth digits read more naturally than the previous English-numeral fallback.
  • No MCP API change. No agent-runtime behavior change. Operator stack must be restarted in-place after upgrading (./scripts/restart-operator-stack-in-place.sh) so the running Python tts-worker picks up the new code.

🤖 Generated with Claude Code

v1.17.2 — stuck-detector: codex MCP approval + agy trust folder + fixture harness

23 May 12:14
8a1f40d

Choose a tag to compare

Patch release

Closes two real gaps in the helper stuck-detector surfaced by spawning live codex and agy helpers and capturing their tmux panes through every state (idle / running / approval / picker / interrupt). v1.17.1 was a snapshot of regex patterns; v1.17.2 adds verbatim fixtures so the next regression is caught by npm test, not by an operator wondering why a helper went silent.

What changed

  • New pattern codex_mcp_approval/Allow the .+ MCP server to run tool/. Codex's MCP tool-call approval modal is a separate path from the shell-command modal that codex_approval already catches. Previously a helper's first MCP tool call would stall silently with no inbox alert.
  • New pattern agy_trust_folder/Do you trust the contents of this project\?/. Antigravity's first-run workspace trust prompt blocks before any mission is injected; without this pattern, helper bring-up would hang on a brand-new worktree.
  • codex_picker regex fixed — replaced /Switch to (gpt|claude|gemini)-/ with /Select Model and Effort/. The old wording only appeared on a transient confirm screen and missed the primary /model picker that operators actually see.
  • Fixture-driven test harnesstest/face-app/fixtures/stuck_detector/{codex,agy}/ holds 13 verbatim ANSI-stripped tmux pane snapshots from real helpers (codex × 7, agy × 6). A new FIXTURE_CASES loop in helper_stuck_detector.test.mjs asserts that each positive fixture fires exactly one expected pattern and that each negative fixture (idle, running, slash-command completion, conversational interrupt recovery) fires nothing. Adding coverage for a newly discovered modal is now drop-fixture + one row.
  • Documentationminimum-headroom-ops skill coverage table regenerated; points operators at the fixture directory for future extensions.

Coverage as of 1.17.2

Pattern CLI
claude_approval (Do you want to proceed?) Claude Code, Antigravity (incl. MCP modal)
codex_approval (Would you like to run the following command?) Codex
codex_mcp_approval (Allow the … MCP server to run tool) Codex
agy_trust_folder (Do you trust the contents of this project?) Antigravity
codex_picker (Select Model and Effort) Codex
codex_quota (You've hit your usage limit) Codex
agy_survey (How's the CLI experience) Antigravity
generic_press_enter (Press enter to confirm) any

Notes

  • 372 tests passing (14 new fixture-driven cases on top of the existing inline-literal coverage).
  • No MCP API change. agent.pane_snapshot and agent.pane_send_key are unchanged.
  • The detector is still post-only; it never auto-presses keys. Extend by appending to DEFAULT_STUCK_PATTERNS in face-app/dist/helper_stuck_detector.js after dropping a real-pane fixture into test/face-app/fixtures/stuck_detector/<cli>/ and adding a row to FIXTURE_CASES.

🤖 Generated with Claude Code

v1.17.1 — Codex shell-approval modal coverage

23 May 11:49
e2dcd59

Choose a tag to compare

Patch release

Adds a missing helper stuck-detector pattern so an operator (claude / codex / antigravity) is alerted when a Codex helper pauses on its shell-command approval modal. v1.17.0 only matched Claude Code's Do you want to proceed? wording — which incidentally also catches Antigravity, but Codex uses different opening text and was silently missed.

What changed

  • New pattern codex_approval with regex /Would you like to run the following command\?/ covering Codex's approval modal. Same response shape as the other approval patterns: operator decides, then sends keys via agent.pane_send_key (e.g. ["1","Enter"] for "Yes, proceed").
  • Documented coveragedoc/examples/skills/minimum-headroom-ops/SKILL.md now has a coverage table listing every shipping regex, which CLI it matches, and what is intentionally not caught.
  • No MCP API change. agent.pane_snapshot and agent.pane_send_key are unchanged.

Coverage as of 1.17.1

Pattern CLI
claude_approval (Do you want to proceed?) Claude Code, Antigravity
codex_approval (Would you like to run the following command?) Codex
codex_picker (Switch to gpt|claude|gemini-…) Codex
codex_quota (You've hit your usage limit) Codex
agy_survey (How's the CLI experience) Antigravity
generic_press_enter (Press enter to confirm) any

Notes

  • 359 tests passing (2 new patterns added).
  • The detector is still post-only; it never auto-presses keys. Extend by appending to DEFAULT_STUCK_PATTERNS in face-app/dist/helper_stuck_detector.js.

🤖 Generated with Claude Code

v1.17.0 — AtomS3R hardware face + multi-agent stuck recovery

23 May 07:48
44d8138

Choose a tag to compare

Highlights

AtomS3R hardware face frontend

End-to-end firmware for the M5Stack AtomS3R + Atomic Echo Base as a physical face for minimum-headroom: 128×128 parametric face driven over WebSocket, local TTS playback through the Echo Base ES8311 codec, push-to-talk recording into face-app operator ASR, on-device Wi-Fi setup portal with force-entry (hold the screen button at boot), up to 3 Wi-Fi slots, USB-CDC RMHCFG provisioning from the PC, and IMU triple-tap reorient. The HTTP bridge is auto-started by the operator stack in its own tmux session.

See firmware/atoms3r-headroom/README.md for hardware links and setup.

Helper stuck detection and pane control

Two new MCP tools, plus a background subsystem inside face-app that drives them. Together they let an operator recover a helper stuck inside a CLI-level modal (tool approval, model picker, usage-limit notice, feedback survey) before the helper's LLM reads any input.

New MCP tools:

  • agent.pane_snapshot — returns the helper pane tail with ANSI stripped, so an operator can read the modal verbatim from any MCP client.
  • agent.pane_send_key — injects named keys (Enter, Escape, Up, Down, …) or literal text into the helper pane, with a named-key allowlist and ASCII / length caps on literal mode.

Background subsystem (not an MCP tool):

  • Helper stuck detector — a timer inside face-app, scheduled by setInterval every ~5 s. On each tick it lists active helpers, reads each helper's pane via the same paneSnapshot runtime that backs the MCP tool, and matches the tail against CLI-specific regex patterns (Claude Do you want to proceed?, Codex Switch to gpt|claude|gemini-… model picker and You've hit your usage limit quota notice, Antigravity How's the CLI experience survey, plus a generic Press enter to confirm). On a fresh match it posts {kind: "blocked", from_agent_id, summary, detail: "<matched line>\n---\n<pane tail>"} to the owner inbox via submitReport. Dedupes by (agent_id, pattern_id, matched_line) for ~30 s so the alarm re-arms only when the line changes. Validated end-to-end against claude / codex / antigravity helpers.

The detector posts; it never auto-presses keys. Operator decides every response via agent.pane_send_key, so a regex match cannot pick No, and always deny for you. Disable with MH_HELPER_STUCK_DETECTOR=off; tune cadence with MH_HELPER_STUCK_DETECTOR_INTERVAL_MS (default 5000, minimum 250).

Operator stack: Antigravity CLI + GUI

The operator stack and example agent config have been updated for Antigravity CLI and GUI, with a refreshed doc/examples/antigravity/ example and a new examples/rmh-voice-mode/ voice-first launcher that idempotently regenerates per-CLI rule files from a single shared source.

TTS and audio path

  • Sentence-bounded FIFO chunking for long TTS replies (keeps the AtomS3R speaker responsive on long answers).
  • TTS audio served by HTTP reference for non-browser sinks.
  • Operator PTT barge-in flushes the TTS chunk queue.
  • ES8311 codec mutex serializes mic and speaker on the shared codec.
  • AtomS3R-friendly chunk-size cap (MH_TTS_CHUNK_MAX_CHARS=64).

Compatibility

  • mcp-server advertises version 1.17.0 (previously drifted at 1.16.0 while every other site was already on 1.16.1 — corrected this release).
  • Two new MCP tools (agent.pane_snapshot, agent.pane_send_key) added; all existing tools unchanged.
  • StackChan Minimal sidecar removed (was a third-party adapter, not a core feature). It can live in a separate companion repo if needed.

Notes

  • 357 tests passing.
  • Public-readability cleanup: removed personal absolute paths, replaced internal milestone/phase labels with feature descriptions, added a Japanese mirror to the AtomS3R firmware README.

🤖 Generated with Claude Code

v1.16.1

10 May 10:06
f96c2f0

Choose a tag to compare

What's Changed

  • Removed public README and hook-bridge documentation references to an internal hook design plan.
  • Bumped package and worker metadata from 1.16.0 to 1.16.1.

Verification

v1.16.0 — Hook-driven face_say safety net

10 May 08:24
4f93eed

Choose a tag to compare

Adds a runtime-agnostic hook bridge so the 3D face speaks even when an agent forgets to call face_say voluntarily. Wires Claude Code's Notification + Stop, Codex's PermissionRequest + Stop, and Gemini CLI's Notification + AfterAgent into a single wrapper script that emits face_say + face_event and (for helpers) an owner-inbox entry tagged source=mh_hook. End-to-end verified on Claude Code, Codex 0.130.0, and Gemini CLI 0.41.2.

Highlights

  • New face.hook MCP tool and face-app/dist/hook_bridge.js orchestrator with per-agent CJK-heuristic language detection, variant-rotation template selection, and owner-inbox routing via the existing assignment state.
  • New scripts/mh-hook.mjs wrapper invoked by each runtime's hook config. Strict "stdout silent / exit 0 always" discipline (required because Gemini's AfterAgent treats exit 2 as "retry this turn with stderr as feedback prompt").
  • New scripts/grant-codex-hook-trust.sh automates Codex's one-time /hooks browser trust grant via a private tmux server. Trust is per-user, persistent, and inherited by every helper Codex spawns thereafter — no need to enter individual helper panes.
  • Forwards MH_FACE_AUTH_TOKEN as ?auth_token= from mh-hook.mjs so it works against face-app instances that require auth.
  • Templates default to ja+en built-in, overridable via ~/.minimum-headroom/face-templates.json.
  • 22 new unit tests; full suite stays green at 325 / 325.

Setup

  1. Merge doc/hook-bridge/{claude-settings.json.example, codex-config.toml.example, gemini-settings.json.example} into your ~/.claude/settings.json, ~/.codex/config.toml, ~/.gemini/settings.json (substitute /ABS/PATH/).
  2. For Codex only: run ./scripts/grant-codex-hook-trust.sh once after editing ~/.codex/config.toml. Re-run only when you change a hook command or matcher.
  3. Restart the operator stack so the operator pane picks up MH_FACE_AGENT_ID.

Documentation

  • Top-level README.md Hook Bridge section (English + 日本語)
  • doc/hook-bridge/README.md (English + 日本語)
  • Per-runtime drop-in examples under doc/hook-bridge/ and doc/examples/{claude-code,codex,antigravity}/

Compatibility note

Codex [features].codex_hooks = true is the deprecated alias of [features].hooks = true as of Codex 0.131; Codex < 0.131 still accepts both with a startup warning.

Full Changelog: v1.15.0...v1.16.0