Releases: amariichi/MinimumHeadroom
v1.18.2 — AtomS3R: auto-patch WebServer to remove 5s audio POST tail
Patch release
Removes a fixed ~5 second tail from every audio POST to AtomS3R, so chunked TTS playback plays smoothly instead of choppy.
The bug
The Arduino ESP32 `WebServer` raw-upload loop in
`framework-arduinoespressif32/libraries/WebServer/src/Parsing.cpp` reads
`HTTP_RAW_BUFLEN` (1436) bytes per iteration regardless of how many body bytes remain.
The final iteration almost always asks for more than is left, so
`WiFiClient::readBytes` blocks waiting for bytes that will never arrive until
`HTTP_MAX_SEND_WAIT` (5000 ms) elapses. This adds a fixed ~5 s tail to every
audio POST, so each AtomS3R chunk took ~6 s end-to-end while face-app produced
chunks every ~3 s — the bridge queue grew, and the listener heard the audio
break up.
The fix
- New `firmware/atoms3r-headroom/scripts/apply_webserver_patch.py` is a PlatformIO pre-build hook (registered as `extra_scripts` in `platformio.ini`). It caps each `readBytes()` request to the actual remaining body bytes, so the last read returns immediately.
- The hook is idempotent (self-marks the patched line with a `PATCH(minimum-headroom):` comment) and fails loudly if the upstream library no longer matches the expected snippet, so a future framework upgrade cannot silently regress the fix.
- The equivalent unified diff is checked in at `firmware/atoms3r-headroom/patches/webserver_raw_read_cap.patch` for transparency.
- README documents the rationale, manual application, and measured effect in both English and Japanese.
Measured effect
| Payload | Before | After | Speedup |
|---|---|---|---|
| 30 KB | 5.17 s | 0.18 s | 28× |
| 60 KB | 5.23 s | 0.37 s | 14× |
| 120 KB | 5.38 s | 0.48 s | 11× |
| 180 KB | 5.58 s | 0.68 s | 8× |
`RAW_START` → `handleAudio` gap reduced from 5070 ms to 78–430 ms (measured via temporary serial instrumentation, now removed).
Compatibility
- No protocol change.
- No firmware behavior change beyond the speedup; the audio queue path itself is unchanged.
- No host-side (face-app / tts-worker / mcp-server) behavior change.
- The patch modifies a file under `~/.platformio/packages/` (system library). The first `pio run` after pulling this release prints `[webserver-patch] applied: …` once; subsequent builds print `[webserver-patch] already applied: …`. The patch is reapplied automatically after a framework upgrade.
🤖 Generated with Claude Code
v1.18.1 — TTS: strip JA full-stops before Kokoro misaki phonemizer
Patch release
Removes an audible artifact at the end of Japanese sentences spoken through the default Kokoro engine.
The bug
Misaki's pyopenjtalk-backed Japanese G2P maps the JA full-stop 「。」 (and its fullwidth ASCII twin 「.」) to an actual phoneme rather than silence, so Kokoro rendered chunk endings as a short "ye"-like sound. Confirmed on hardware: あ。 produced あ plus an extra sound, and あ。。。。。 produced あ plus five repeated artifacts (one per period), proving each 。 was reaching misaki and being phonemized.
The fix
- New module
tts-worker/src/tts_worker/kokoro_text.pyexposesstrip_japanese_silent_punctuation, which removes 「。」 and 「.」 runs. KokoroEngine._to_ja_phonemesstrips the input before calling misaki and returns an empty string if the chunk becomes empty after stripping.KokoroEngine.synthesize_chunksnow skips chunks that come back empty (this also prevents the pre-existingmisaki ja g2p returned empty phoneme outputerror path on punctuation-only chunks).- Other JA punctuation (「、」「!」「?」「・」「…」) is intentionally left in place — those either drive prosodic pausing or have not been observed to produce artifacts.
- Shared text normalization (
tts_worker.shared_text) is untouched, so the existing "preserve JA punctuation" contract there still holds; the strip is engine-local to Kokoro, parallel to howqwen3_text.pyhouses Qwen3-specific text prep.
Tests
tts-worker/tests/test_kokoro_text.py adds 8 unit tests: trailing 「。」, repeated 「。」 runs, internal + trailing, fullwidth 「.」, preservation of 「、」「!」「?」「・」「…」, punctuation-only input → empty, empty input passthrough, ASCII passthrough. All tts_worker tests pass (25/25).
Compatibility
- No protocol change.
- No firmware change.
- No API change.
- Affects only the Kokoro TTS path (default engine). Qwen3 path is unchanged.
Verification
- CI green on PR #59 (28 s) and on the merged main commit.
- Live hardware test on AtomS3R after a fresh Kokoro restart:
あ。now sounds likeあonly, no trailing artifact.
🤖 Generated with Claude Code
v1.18.0 — Atom TTS prefetch+FIFO, deferred idle TTS, Tailscale router guide
Minor release
Three independently useful changes land together in this release.
1. Atom TTS prefetch + FIFO (primary feature)
Long multi-sentence answers now play with much shorter inter-chunk gaps on every remote audio sink (PC browser tab, mobile browser, and AtomS3R). Three coordinated changes:
- TTS worker (
tts-worker/src/tts_worker/__main__.py): newMH_TTS_REMOTE_PREFETCH_MS(default900) returnsplay_stopearly for browser-only audio targets so the next chunk's synthesis can overlap the current chunk's playback. Gated toMH_AUDIO_TARGET=browserbecause the local host speaker uses the worker's own audio clock and must not be cut short. - face-app browser audio (
face-app/public/app.js): same-session chunks now queue sequentially instead of replacing the active source. Interrupt, drop, and error paths still stop and clear the queue. - AtomS3R firmware (
firmware/atoms3r-headroom/src/headroom_audio.{cpp,h},headroom_transport.{cpp,h}): bounded one-pending-WAV FIFO.stop(),stopForRecording(), PTT, and interrupt all flush both the active and queued audio.
2. Bridge generation watermark fix
Hardware testing surfaced a regression where AtomS3R cut off mid-utterance and then spoke an idle notification. Root cause: scripts/atoms3r-http-bridge.mjs advanced its generation watermark on every relayed WebSocket payload, so a deferred-idle (introduced earlier in this release) tts_state at generation N+1 caused the in-flight gen=N utterance's tail chunks to be dropped as "stale". Fix: scope observeGeneration to tts_audio / tts_audio_ref only — state, event, and mouth payloads no longer advance the watermark. Verified end-to-end on hardware with both Kokoro and Qwen3 TTS engines.
3. Deferred idle TTS + Tailscale travel router guide
Already on the branch from earlier work this cycle:
- Defer idle TTS notifications until speech drains — the hook bridge's
idle_after_responsenotification now queues behind the in-flight utterance (with the bridge fix above, this is no longer just a controller-side guarantee). - Tailscale travel router setup guide (
doc/guides/tailscale-travel-router-setup.md) — how to place AtomS3R behind a GL.iNet travel router and reach it from your PC via Tailscale subnet routing.
Operational note: prefer --audio-target browser for remote sinks
The new prefetch path only engages when MH_AUDIO_TARGET=browser. In both mode the worker keeps waiting through the full local-playback duration, so remote sinks fall back to the older synthesize-then-send-then-play pacing. README, doc/guides/operator-stack.md (EN + JA), and examples/rmh-voice-mode/README.md now recommend browser for PC browser tabs, mobile, and AtomS3R, with a short explanation. scripts/restart-operator-stack-in-place.sh default changed from both to browser to match scripts/run-operator-stack.sh's existing default.
Verification
- Full
npm testgreen (375 tests), Node syntax checks, Python compile checks. - PlatformIO firmware build green:
RAM 51,380 bytes (15.7%) / Flash 1,231,493 bytes (36.8%). - Live end-to-end test on AtomS3R hardware: gap reduction confirmed audibly with both Kokoro (default) and Qwen3 (
--profile qwen3) TTS engines. - CI green on PR #58 (29 s) and on the merged main commit.
🤖 Generated with Claude Code
v1.17.4 — docs: add AtomS3R + atoms3r-http-bridge to architecture diagrams
Patch release
Documentation-only update that closes a long-standing gap: AtomS3R has been a first-class hardware face since v1.17.0 but the architecture diagrams still only depicted the browser-based Frontend UI. The README prose described AtomS3R correctly, the diagrams did not — confusing for anyone trying to understand the runtime topology.
What changed
- High-level flow (
doc/diagrams/high-level-flow.{mmd,svg,png}+ inline README blocks in both languages): adds anAtomS3R Device(2D face LCD + Echo speaker + PTT mic) andatoms3r-http-bridgenode. AtomS3R uplinks (mic WAV →/api/operator/asr,operator_response→/api/operator/response) go direct to face-app over HTTP, in parallel with the Frontend UI's uplinks. Downlinks (face / TTS payloads + audio) fan out through the bridge to AtomS3R's/api/headroom/{payload,audio}endpoints. - Sequence timeline (
doc/diagrams/sequence-timeline.{mmd,svg,png}+ inline README blocks): adds Input path D — AtomS3R PTT alongside paths A/B/C, and adds the bridge fan-out forevent/say/statepayloads andtts_audio/tts_mouthin the output section. New participants:ATOMandATOMBR. - README: keeps English and Japanese inline
mermaidblocks in sync with the.mmdsources (4 blocks total). - PNG/SVG fallbacks re-rendered at
--width 2400 --scale 2so the static images stay legible at higher zoom levels.
What did not change
The "3D face" references for the browser-side Frontend UI are intentionally untouched — that face really is 3D (Three.js-rendered head, eye/eyebrow/mouth/head animation). Only the AtomS3R LCD face is labeled as 2D, which matches the actual firmware behavior.
No code change, no API change, no runtime behavior change.
Verification
- Diagrams re-rendered via
npx @mermaid-js/mermaid-cli@11.12.0with--puppeteerConfigFile(no-sandbox) and verified to containAtomS3Randatoms3r-http-bridgetext nodes. - CI green (33 s).
🤖 Generated with Claude Code
v1.17.3 — TTS: scope hai-filler to Qwen3 + fullwidth-ify JA halfwidth digits
Patch release
Two independent cleanups to the shared TTS text pipeline that affect how Kokoro (the default engine) reads Japanese utterances. Verified end-to-end through the live Kokoro pipeline before tagging.
What changed
はい、leading filler is now Qwen3-only. It was added to the shared normalizer as a Mandarin-drift countermeasure for Qwen3, but Kokoro+misaki has no such drift and was getting an unwanted "Yes," before every sentence that opened with a halfwidth ASCII or numeric token (e.g.execplanを作成しました。was rendering asはい、execplanを…).apply_japanese_leading_numeric_fillerandapply_japanese_leading_unknown_ascii_fillernow live in the Japanese branch ofprepare_qwen3_text. Qwen3 behavior is unchanged; Kokoro stops getting the prefix.- Halfwidth digits inside Japanese text are now fullwidth-ified. Misaki's English G2P used to fire on halfwidth digits embedded in Japanese, so
今日は5月23日です。rendered asファイブ月とウェンティースリー日です.normalize_japanese_tts_textnow translates0-9 → 0-9so misaki keeps the digits on the Japanese G2P path (今日は5月23日です。). Pure-English utterances are routed throughnormalize_english_tts_textinstead, soThe build runs at 5:30 on port 8080.still reads as English.
Verified end-to-end through the live Kokoro pipeline
| Input | Reading |
|---|---|
今日は5月23日です。 |
Japanese date reading |
execplanを作成しました。 |
No spurious leading はい、 |
23日までに完了します。 |
Japanese digit reading, no filler |
The build runs at 5:30 on port 8080. |
English reading preserved (untouched by JA pipeline) |
現在のバージョンは1.2.3です。 |
Japanese reading of 1.2.3 |
Tests
- shared_text: 6 new cases (date phrase, dotted version, mixed sentence with thousands separator, pure-English routing, decoration). 3 existing cases updated.
- qwen3_text: 3 new cases asserting Qwen3-Japanese-mode still receives the filler so the existing Mandarin-drift mitigation is preserved.
- Python suite: 42/42 pass. Node suite: 372/372 pass.
Notes
- Mixed JA/English sentences such as
API 2.0 を試したare now read asAPI 2.0 を試したbecause the text contains JA script and is therefore JA-routed. Misaki is already in Japanese G2P mode in that case, so the fullwidth digits read more naturally than the previous English-numeral fallback. - No MCP API change. No agent-runtime behavior change. Operator stack must be restarted in-place after upgrading (
./scripts/restart-operator-stack-in-place.sh) so the running Python tts-worker picks up the new code.
🤖 Generated with Claude Code
v1.17.2 — stuck-detector: codex MCP approval + agy trust folder + fixture harness
Patch release
Closes two real gaps in the helper stuck-detector surfaced by spawning live codex and agy helpers and capturing their tmux panes through every state (idle / running / approval / picker / interrupt). v1.17.1 was a snapshot of regex patterns; v1.17.2 adds verbatim fixtures so the next regression is caught by npm test, not by an operator wondering why a helper went silent.
What changed
- New pattern
codex_mcp_approval—/Allow the .+ MCP server to run tool/. Codex's MCP tool-call approval modal is a separate path from the shell-command modal thatcodex_approvalalready catches. Previously a helper's first MCP tool call would stall silently with no inbox alert. - New pattern
agy_trust_folder—/Do you trust the contents of this project\?/. Antigravity's first-run workspace trust prompt blocks before any mission is injected; without this pattern, helper bring-up would hang on a brand-new worktree. codex_pickerregex fixed — replaced/Switch to (gpt|claude|gemini)-/with/Select Model and Effort/. The old wording only appeared on a transient confirm screen and missed the primary/modelpicker that operators actually see.- Fixture-driven test harness —
test/face-app/fixtures/stuck_detector/{codex,agy}/holds 13 verbatim ANSI-stripped tmux pane snapshots from real helpers (codex × 7, agy × 6). A newFIXTURE_CASESloop inhelper_stuck_detector.test.mjsasserts that each positive fixture fires exactly one expected pattern and that each negative fixture (idle, running, slash-command completion, conversational interrupt recovery) fires nothing. Adding coverage for a newly discovered modal is now drop-fixture + one row. - Documentation —
minimum-headroom-opsskill coverage table regenerated; points operators at the fixture directory for future extensions.
Coverage as of 1.17.2
| Pattern | CLI |
|---|---|
claude_approval (Do you want to proceed?) |
Claude Code, Antigravity (incl. MCP modal) |
codex_approval (Would you like to run the following command?) |
Codex |
codex_mcp_approval (Allow the … MCP server to run tool) |
Codex |
agy_trust_folder (Do you trust the contents of this project?) |
Antigravity |
codex_picker (Select Model and Effort) |
Codex |
codex_quota (You've hit your usage limit) |
Codex |
agy_survey (How's the CLI experience) |
Antigravity |
generic_press_enter (Press enter to confirm) |
any |
Notes
- 372 tests passing (14 new fixture-driven cases on top of the existing inline-literal coverage).
- No MCP API change.
agent.pane_snapshotandagent.pane_send_keyare unchanged. - The detector is still post-only; it never auto-presses keys. Extend by appending to
DEFAULT_STUCK_PATTERNSinface-app/dist/helper_stuck_detector.jsafter dropping a real-pane fixture intotest/face-app/fixtures/stuck_detector/<cli>/and adding a row toFIXTURE_CASES.
🤖 Generated with Claude Code
v1.17.1 — Codex shell-approval modal coverage
Patch release
Adds a missing helper stuck-detector pattern so an operator (claude / codex / antigravity) is alerted when a Codex helper pauses on its shell-command approval modal. v1.17.0 only matched Claude Code's Do you want to proceed? wording — which incidentally also catches Antigravity, but Codex uses different opening text and was silently missed.
What changed
- New pattern
codex_approvalwith regex/Would you like to run the following command\?/covering Codex's approval modal. Same response shape as the other approval patterns: operator decides, then sends keys viaagent.pane_send_key(e.g.["1","Enter"]for "Yes, proceed"). - Documented coverage —
doc/examples/skills/minimum-headroom-ops/SKILL.mdnow has a coverage table listing every shipping regex, which CLI it matches, and what is intentionally not caught. - No MCP API change.
agent.pane_snapshotandagent.pane_send_keyare unchanged.
Coverage as of 1.17.1
| Pattern | CLI |
|---|---|
claude_approval (Do you want to proceed?) |
Claude Code, Antigravity |
codex_approval (Would you like to run the following command?) |
Codex |
codex_picker (Switch to gpt|claude|gemini-…) |
Codex |
codex_quota (You've hit your usage limit) |
Codex |
agy_survey (How's the CLI experience) |
Antigravity |
generic_press_enter (Press enter to confirm) |
any |
Notes
- 359 tests passing (2 new patterns added).
- The detector is still post-only; it never auto-presses keys. Extend by appending to
DEFAULT_STUCK_PATTERNSinface-app/dist/helper_stuck_detector.js.
🤖 Generated with Claude Code
v1.17.0 — AtomS3R hardware face + multi-agent stuck recovery
Highlights
AtomS3R hardware face frontend
End-to-end firmware for the M5Stack AtomS3R + Atomic Echo Base as a physical face for minimum-headroom: 128×128 parametric face driven over WebSocket, local TTS playback through the Echo Base ES8311 codec, push-to-talk recording into face-app operator ASR, on-device Wi-Fi setup portal with force-entry (hold the screen button at boot), up to 3 Wi-Fi slots, USB-CDC RMHCFG provisioning from the PC, and IMU triple-tap reorient. The HTTP bridge is auto-started by the operator stack in its own tmux session.
See firmware/atoms3r-headroom/README.md for hardware links and setup.
Helper stuck detection and pane control
Two new MCP tools, plus a background subsystem inside face-app that drives them. Together they let an operator recover a helper stuck inside a CLI-level modal (tool approval, model picker, usage-limit notice, feedback survey) before the helper's LLM reads any input.
New MCP tools:
agent.pane_snapshot— returns the helper pane tail with ANSI stripped, so an operator can read the modal verbatim from any MCP client.agent.pane_send_key— injects named keys (Enter,Escape,Up,Down, …) or literal text into the helper pane, with a named-key allowlist and ASCII / length caps on literal mode.
Background subsystem (not an MCP tool):
- Helper stuck detector — a timer inside face-app, scheduled by
setIntervalevery ~5 s. On each tick it lists active helpers, reads each helper's pane via the samepaneSnapshotruntime that backs the MCP tool, and matches the tail against CLI-specific regex patterns (ClaudeDo you want to proceed?, CodexSwitch to gpt|claude|gemini-…model picker andYou've hit your usage limitquota notice, AntigravityHow's the CLI experiencesurvey, plus a genericPress enter to confirm). On a fresh match it posts{kind: "blocked", from_agent_id, summary, detail: "<matched line>\n---\n<pane tail>"}to the owner inbox viasubmitReport. Dedupes by(agent_id, pattern_id, matched_line)for ~30 s so the alarm re-arms only when the line changes. Validated end-to-end against claude / codex / antigravity helpers.
The detector posts; it never auto-presses keys. Operator decides every response via agent.pane_send_key, so a regex match cannot pick No, and always deny for you. Disable with MH_HELPER_STUCK_DETECTOR=off; tune cadence with MH_HELPER_STUCK_DETECTOR_INTERVAL_MS (default 5000, minimum 250).
Operator stack: Antigravity CLI + GUI
The operator stack and example agent config have been updated for Antigravity CLI and GUI, with a refreshed doc/examples/antigravity/ example and a new examples/rmh-voice-mode/ voice-first launcher that idempotently regenerates per-CLI rule files from a single shared source.
TTS and audio path
- Sentence-bounded FIFO chunking for long TTS replies (keeps the AtomS3R speaker responsive on long answers).
- TTS audio served by HTTP reference for non-browser sinks.
- Operator PTT barge-in flushes the TTS chunk queue.
- ES8311 codec mutex serializes mic and speaker on the shared codec.
- AtomS3R-friendly chunk-size cap (
MH_TTS_CHUNK_MAX_CHARS=64).
Compatibility
mcp-serveradvertises version1.17.0(previously drifted at1.16.0while every other site was already on1.16.1— corrected this release).- Two new MCP tools (
agent.pane_snapshot,agent.pane_send_key) added; all existing tools unchanged. - StackChan Minimal sidecar removed (was a third-party adapter, not a core feature). It can live in a separate companion repo if needed.
Notes
- 357 tests passing.
- Public-readability cleanup: removed personal absolute paths, replaced internal milestone/phase labels with feature descriptions, added a Japanese mirror to the AtomS3R firmware README.
🤖 Generated with Claude Code
v1.16.1
v1.16.0 — Hook-driven face_say safety net
Adds a runtime-agnostic hook bridge so the 3D face speaks even when an agent forgets to call face_say voluntarily. Wires Claude Code's Notification + Stop, Codex's PermissionRequest + Stop, and Gemini CLI's Notification + AfterAgent into a single wrapper script that emits face_say + face_event and (for helpers) an owner-inbox entry tagged source=mh_hook. End-to-end verified on Claude Code, Codex 0.130.0, and Gemini CLI 0.41.2.
Highlights
- New
face.hookMCP tool andface-app/dist/hook_bridge.jsorchestrator with per-agent CJK-heuristic language detection, variant-rotation template selection, and owner-inbox routing via the existing assignment state. - New
scripts/mh-hook.mjswrapper invoked by each runtime's hook config. Strict "stdout silent / exit 0 always" discipline (required because Gemini'sAfterAgenttreats exit 2 as "retry this turn with stderr as feedback prompt"). - New
scripts/grant-codex-hook-trust.shautomates Codex's one-time/hooksbrowser trust grant via a private tmux server. Trust is per-user, persistent, and inherited by every helper Codex spawns thereafter — no need to enter individual helper panes. - Forwards
MH_FACE_AUTH_TOKENas?auth_token=frommh-hook.mjsso it works against face-app instances that require auth. - Templates default to ja+en built-in, overridable via
~/.minimum-headroom/face-templates.json. - 22 new unit tests; full suite stays green at 325 / 325.
Setup
- Merge
doc/hook-bridge/{claude-settings.json.example, codex-config.toml.example, gemini-settings.json.example}into your~/.claude/settings.json,~/.codex/config.toml,~/.gemini/settings.json(substitute/ABS/PATH/). - For Codex only: run
./scripts/grant-codex-hook-trust.shonce after editing~/.codex/config.toml. Re-run only when you change a hook command or matcher. - Restart the operator stack so the operator pane picks up
MH_FACE_AGENT_ID.
Documentation
- Top-level
README.mdHook Bridge section (English + 日本語) doc/hook-bridge/README.md(English + 日本語)- Per-runtime drop-in examples under
doc/hook-bridge/anddoc/examples/{claude-code,codex,antigravity}/
Compatibility note
Codex [features].codex_hooks = true is the deprecated alias of [features].hooks = true as of Codex 0.131; Codex < 0.131 still accepts both with a startup warning.
Full Changelog: v1.15.0...v1.16.0