v1.17.0 — AtomS3R hardware face + multi-agent stuck recovery#53
Merged
Conversation
Hardware-validated AtomS3R frontend work: - firmware: AtomS3R has no internal speaker; use Atomic Echo Base (cfg.internal_spk=false, external_speaker.atomic_echo=true), set speaker volume 130 - face_renderer: success face uses normal eyes + brown raised "\/" brows + closed-mouth smile arc (口パク preserved while speaking); blink is a dark downward-convex eyelid arc with randomized 3/4/5 s scheduling (1-in-10 quick 0.3 s re-blink); Thinking sweeps the pupils left/right - headroom_transport: Failed background persists until the next state event (like Permission) instead of auto-reverting after 8 s - scripts: default restart-operator-stack-in-place.sh to FACE_AUDIO_TARGET=both so the PC->Atom bridge receives tts_audio; add atoms3r-http-bridge.mjs and the stackchan-minimal sidecar Local commit only; not pushed. ExecPlan PLANS_48 updated on disk (.agent/ is gitignored by repo design, so it is not in this commit). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Closed "∪" eyelid: larger radius (15->19) with a narrower sweep (25..155 -> 45..135) for a gentler, flatter curve, dropped a touch lower. - Shift the whole composed face down by kFaceOffsetY (4px) so the head looks vertically centered despite the visual weight of the hair. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add tts_audio_store: TTL-bounded server-side store of generated TTS audio, exposed over HTTP with a lightweight WS reference payload and WAV-duration parsing. - tts_controller: always stash audio and broadcast a reference; only broadcast the base64 body when browser audio is enabled (previously audio was dropped entirely when browser audio was off), so the AtomS3R/Echo Base bridge and Stack-chan sidecar can fetch it by URL. - index.js: wire the store (MH_TTS_AUDIO_REF_TTL_MS, default 60s) into the HTTP router and controller; clear it on worker stop. - package.json: add stackchan:run and atoms3r:bridge scripts. - .gitignore: ignore .venv-platformio/ and esp-web-tools-logs.txt. - Tests for the store (reference metadata, TTL expiry, WAV duration) and the controller reference path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…fix)
Long agent replies exceeded the AtomS3R firmware's per-utterance
base64/HTTP WAV cap, so the audio was dropped while the mouth kept
animating from the independent tts_mouth stream ("mouth-only, no
sound" on long local-LLM output).
- Add segmentTtsText: JA/EN hard sentence boundaries, late comma soft
split, greedy packing, default 120 chars (MH_TTS_CHUNK_MAX_CHARS).
Text <= limit is returned verbatim (unchanged single-chunk path).
- Replace the single `pending` slot with a real ordered FIFO queue;
one logical utterance occupies it, a newer say flushes the
remainder, interrupt/auto-interrupt/stop clear the whole queue.
- Each chunk is its own worker `speak` with the parent generation and
a #k/N utterance/message suffix, dispatched sequentially on
play_stop, keeping every WAV under the Atom size cap.
- Tests: segmentTtsText units + sequential-dispatch + interrupt-flush;
full node --test suite green (333).
Step 1 of 3 (Step 2: PTT-clear wiring; Step 3: firmware playback queue).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the operator takes the turn, queued/active agent speech should stop instead of talking over the just-spoken input. - tts_controller: add flushForBargeIn(reason) — clear the chunk FIFO, interrupt + release the active chunk, emit play_stop, advance the generation so late worker audio/mouth for the old utterance is ignored, and clear the audio store so a memory-constrained sink cannot pull a stale chunk. - operator_asr_proxy: new onBargeIn option, invoked as soon as a POST /api/operator/asr audio upload arrives — the earliest cross-transport "user took the turn" signal (the Atom posts here too; it has no usable WebSocket client). Handler errors are caught so ASR still proceeds. - index.js: wire onBargeIn -> ttsController.flushForBargeIn. - Tests: controller flush behavior + audio-store clear; proxy onBargeIn invocation, non-ASR negative, and throw-safety. Full node --test green (338). Step 2 of 3 (Step 3: AtomS3R firmware playback queue + stop-on-PTT). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Preservation commit of codex's in-flight, build-passing, hardware-validated PTT firmware (Milestone 6 in PLANS_48): button hold-to-record, M5.Mic capture via the Atomic Echo Base, 16 kHz mono WAV wrapping, POST to /api/operator/asr?lang=<asrLanguage>, and operator_response submission over the authenticated HTTP fallback. Includes persisted asrLanguage (ja/en) with setup-portal selection, HeadroomTransport::sendOperatorText(), the HeadroomPtt record/process/submit module, longer ASR HTTP read timeout, and ingress/settings/config wiring. `pio run` succeeds (RAM 15.6%, Flash 36.2%); transcript verified on real hardware per PLANS_48. Committed by Claude to preserve the work while codex is paused; no behavioral changes were made to codex's firmware in this commit. Co-Authored-By: Codex (OpenAI) <noreply@openai.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Server-side sentence chunking delivers an ordered burst of small WAV refs. The firmware played each one via playOwnedWav, which calls M5.Speaker.stop() first, so a newly arrived chunk truncated the one still playing (choppy / only the last chunk audible). - headroom_audio: add a bounded FIFO (kMaxQueued=8). playBase64Wav / playHttpWavRef / playWavBytes route through playOrEnqueue: play immediately when idle, else enqueue the owned buffer. loop() starts the next queued chunk when the speaker goes idle. - stop()/stopForRecording() (the PTT-press path) clear the queue and free buffers, so buffered agent speech does not resume after the user takes the turn. - busy() stays true while chunks are queued so the face holds the Speaking expression across inter-chunk gaps instead of flickering. - headroom_transport: zero mouthOpen when a chunk's audio is dropped, killing phantom 口パク from the independent tts_mouth stream. Built (RAM 15.7%, Flash 36.3%) and flashed to the real AtomS3R. Step 3 of 3. Completes the long-TTS chunking + PTT-clear feature (Step 1 2d26c1c, Step 2 882889e). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Field bug after the chunking work: long replies were still mouth-only with a burst of static at the end. The atoms3r bridge log showed the Atom rejecting ~800 KB WAVs with HTTP 413 payload_too_large; the rare small sentence slipped through and a near-cap one played truncated (the static). The 120-char chunk default was never calibrated to the Atom HTTP ingress cap (~250 KB accepted in practice), and one Hermes sentence (<=120 chars) was not split at all. restart-operator-stack-in-place.sh now exports MH_TTS_CHUNK_MAX_CHARS (default 24 ~= ~3 s ~= ~150 KB WAV) into the stack so the Atom-facing pipeline chunks small enough to be accepted. The global code default stays 120 for browser/PC. Verified live: long utterance now splits into ~110-165 KB WAV chunks, all forwarded with no 413. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Once server-side chunking made WAVs small enough to be accepted (no
more 413), every chunk played as loud static with faint voice. The
regression was introduced solely by the Step 3 FIFO; the validated
Milestone 5 single-play path was clean. Serial showed no firmware
rejection, so the WAV was accepted and "played" but corrupted -
consistent with a buffer-lifetime/scheduling bug around the async
M5.Speaker.playWav in the queue.
Restore headroom_audio.{cpp,h} and headroom_transport.cpp to the
validated 2579dcc state, rebuilt and reflashed. Server-side chunking
(Steps 1/2) and the small MH_TTS_CHUNK_MAX_CHARS budget are kept, so
the Atom receives small WAVs played one-at-a-time by the known-good
path. A correct Atom playback queue is deferred to an isolated rework
with on-device serial validation (see PLANS_48).
This reverts the firmware portion of ed6f4bf only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…uning Local-only (public repo; not pushed per standing instruction). Consolidates this session's later hardware-validated work: - ES8311 codec mutual-exclusion: HeadroomAudio `recording_` inhibit set in stopForRecording()/cleared in restoreAfterRecording(); play*() return Ignored during the PTT mic window; restoreAfterRecording() forces M5.Speaker.end()->begin() so the codec is re-init'd to DAC. ingress treats Ignored as a benign 202. Fixes the persistent static-until-power-cycle latch when PTT interrupts playback. - mcp-server: face_say/face_event/face_ping auto-fill session_id from MH_FACE_SESSION_ID (default "operator"); removed from required so weak local models no longer flail. - scripts/ensure-atoms3r-bridge.sh (idempotent, MH_SKIP opt-out) called from run-operator-stack.sh so every operator bring-up keeps the Atom PC->bridge alive (no more silent Atom after a stack restart). - run-asr-worker.sh honors MH_ASR_DEVICE over ~/.bashrc's ASR_DEVICE. - TTS chunking: MH_TTS_CHUNK_MAX_CHARS default 64 at the run-operator-stack.sh / run-face-app.sh chokepoints; firmware HEADROOM_MAX_BASE64_TTS_SECONDS 10->15 (ingress cap ~641KB->~954KB). - atoms3r-http-bridge: drop stale-generation audio on barge-in. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ent, PTT arming cue - headroom_settings: 3 Wi-Fi slots (slot 1 keeps existing NVS keys; no migration). connectWifi() tries slot 1->2->3, first that works. - headroom_serial_provision (new) + scripts/atoms3r-provision.mjs: one-shot USB-serial RMHCFG provisioning (wifi x3 / token / urls), token resolved from MH_FACE_AUTH_TOKEN env or shared env file, redacted RMHCFG? state (incl. live IMU accel for calibration), no npm deps. - main.cpp: screen-button tap classifier. Triple-tap rotates the face; IMU auto-uprights to the nearest of 4 when tilted (calibrated offset=90/sign=-1), +90 step fallback when flat/no-IMU. cfg.internal_imu enabled (verified to coexist with ES8311 audio). - PTT now arms on a 500 ms hold (tap vs hold disambiguation) with an audible "pi" cue drained before the mic window (ES8311-safe ordering). - setup portal: SSID/password 2 & 3 fields. README + config example updated. - Bundles in-progress branch audio-tuning changes already present on atoms3r-face-audio-tuning (intermingled in the same files). Hardware-validated on a real AtomS3R: provisioning round-trip, multi-slot connect, IMU 4-way snap correct, cue audible, TTS audio clean. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…end_key A helper can stall inside a CLI modal (tool-approval prompt, model picker, usage-limit notice, feedback survey) where the underlying LLM is not even reading input. The injected mission never reaches the model and the operator had no way to notice short of running tmux capture-pane themselves. This change ships three pieces inside the existing minimum_headroom MCP server so any operator (claude, codex, agy, automation) can see and recover from stalls without shell access: - helper_stuck_detector: a 5s background loop inside face-app scans every active helper pane against a small regex set (Do you want to proceed?, Switch to gpt-..., You've hit your usage limit, How's the CLI experience, Press enter to confirm) and posts kind=blocked reports into the owner inbox. Dedupes the same (agent, pattern, line) tuple for ~30s. Env knobs: MH_HELPER_STUCK_DETECTOR=off, MH_HELPER_STUCK_DETECTOR_INTERVAL_MS. - agent.pane_snapshot: returns the last N lines of a helper pane (default 40, max 400) with ANSI stripped. Read-only. - agent.pane_send_key: sends raw tmux keys to a helper pane. Named-key allowlist plus printable ASCII; optional literal mode for free-text. Capped at 32 keys per call. Detection and response are split on purpose: the detector posts, the operator decides. A regex match must never choose "No, and always deny". E2E validated against claude-e2e / codex-e2e / agy-e2e helpers — all three modal categories were caught and cleared. Full doc rewrite covers README.md, doc/guides/multi-agent.md, doc/multi-agent-orchestration-spec.md (new section 7.7), and the minimum-headroom-ops skill. 357 tests pass (+14 new). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rename Gemini CLI → Antigravity CLI+GUI across docs, hook bridge, and helper spawn paths. The CLI is now agy and the GUI is the Electron Antigravity desktop app; they share ~/.gemini/ but read MCP servers and skills from different paths, so the install instructions split per surface. README highlights: - Replace the old short Gemini section with a full Antigravity matrix that covers both the agy CLI (agy plugin install + ~/.gemini/antigravity-cli/) and the GUI (~/.gemini/config/mcp_config.json + plugins/). - Add hook integration via doc/examples/antigravity/hooks.json with a shared settings-hooks.snippet.json fallback. Hook bridge: - doc/hook-bridge/gemini-settings.json.example → renamed to doc/examples/antigravity/settings-hooks.snippet.json so the example lives next to its config files. - doc/hook-bridge/antigravity-hooks.json.example added for the new hooks.json plugin path. - doc/hook-bridge/README.md rewritten for Antigravity stdout semantics (PreToolUse / Stop) instead of Gemini Notification / AfterAgent. mh-hook.mjs: - KNOWN_RUNTIMES drops 'gemini', adds 'antigravity'. - New --stdout-mode flag with values 'silent' (Claude/Codex default) and 'antigravity-flow' (writes the minimal JSON Antigravity expects for PreToolUse and Stop flow control while still forwarding face hook payloads). Exit code stays 0 in all paths. - 'Stop' alone now maps to idle_after_response; 'AfterAgent' is dropped. examples/rmh-voice-mode/: - New voice-first launcher that turns claude / codex / agy into push-to-talk agents talking through the AtomS3R desk device (Real Minimum Headroom). - start-rmh.sh resolves the checkout path, installs the antigravity-cli plugin, and renders templates under runtime/ (gitignored). Per-CLI templates for codex config.toml, Claude MCP config, and Antigravity plugin. - AGENTS.md / GEMINI.md inside this directory are repo-global .gitignored (per-project local instructions), so only README.md, CLAUDE.md, and the template/tool scaffolding are committed. Operator-stack quality of life: - run-operator-once.sh: ASR_GPU=1 maps to MH_ASR_DEVICE=cuda so the asr-worker uses CUDA without ~/.bashrc clobbering ASR_DEVICE=cpu when tmux spawns a new shell. MH_ASR_DEVICE override takes precedence. - test/connectivity/phase1.test.mjs: face.event now accepts the default session_id="operator" instead of erroring; updated assertion. All test references to 'gemini' updated to 'antigravity'. Existing PLAN_51 helper-stuck-detection commit (b5b1059) already references the new tool names; this commit adds them to the Antigravity-flavored docs and example. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
CLAUDE.md follows the same convention as AGENTS.md / GEMINI.md: it is the per-project agent instruction file that is local to each checkout and never committed. Adding it to the root .gitignore fixes the asymmetry where the previous Antigravity migration commit (244ecfc) accidentally tracked examples/rmh-voice-mode/CLAUDE.md while AGENTS.md and GEMINI.md in the same directory were ignored. Risk check: examples/rmh-voice-mode/CLAUDE.md was the only tracked CLAUDE.md in the repo. Its content is an auto-generated copy of tools/voice-first-rules.md with a CLI-name comment — no secrets, paths, tokens, or personal info. Removing it from tracking has no public-exposure impact; the canonical source remains under tools/voice-first-rules.md. Changes: - .gitignore: add CLAUDE.md and CLAUDE.local.md (mirror existing AGENTS / GEMINI patterns). - examples/rmh-voice-mode/CLAUDE.md: git rm --cached (file stays on disk for the current checkout; will be regenerated by start-rmh.sh on fresh clones). - examples/rmh-voice-mode/start-rmh.sh: auto-run tools/regenerate-rules.sh when any of CLAUDE.md / AGENTS.md / GEMINI.md is missing, so a fresh `cd examples/rmh-voice-mode && ./start-rmh.sh --agent claude` works without a manual regenerate step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add Hardware section linking M5Stack AtomS3R and Atomic Echo Base, and note that Echo Base is required because AtomS3R has no built-in speaker/microphone. - Drop internal "Milestone 1/2" shorthand; replace with a plain feature list (face rendering, TTS playback, PTT, setup AP, triple-tap reorient). - Add an explicit "Enter the setup portal" section covering both the automatic-on-no-Wi-Fi flow and the hold-screen-button-at-boot force flow. - Mirror the entire English content as a Japanese section following the main README's [English] | [日本語] anchor convention. - Replace `/home/amari1/.config/...` with `$HOME/.config/...` in the shared env file examples. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
These leaked the developer's username into a soon-to-be-public repo and, in the scripts, would have prevented them from working out-of-the-box on anyone else's machine. - integrations/stackchan-minimal/README.md, stackchan.env.example: drop hardcoded `/home/amari1/models/...` Qwen GGUF path; document the env var and require users to set their own absolute path. - scripts/atoms3r-provision.mjs, ensure-atoms3r-bridge.sh: resolve the shared env file via `os.homedir()` / `$HOME` instead of a hardcoded `/home/amari1/.config/minimum-headroom.env`. - scripts/run-bound-mcp-server.sh: drop the user-specific `/home/amari1/.nvm/.../node` fallback and fall back to `$HOME/.nvm/versions/node/default/bin/node` (the generic nvm default symlink) before `/usr/bin/node`. - test/face-app/agent_lifecycle.test.mjs: replace `amari1@host` shell-prompt fixture string with `user@host`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous README was a single line ("Phase 3 TTS worker for Minimum
Headroom.") that exposed an internal milestone label and gave a public
reader no idea what the worker does or how to run it.
- README: describe the worker as a stdio child spawned by face-app, list
the Kokoro/Qwen3 engine choice, document setup (`uv sync`), the smoke
command, and the most common environment knobs.
- pyproject.toml: replace the "phase3" project description with a plain
one-liner describing the engines.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was a convenience bridge that proxied this repo's local ASR/TTS to StackChan Minimal devices using whisper.cpp / piper / VOICEVOX-shaped APIs. It is not a minimum-headroom feature; it is an adapter for a third-party project, so it does not belong in this repo's public release. Removed: - integrations/stackchan-minimal/ (README, env example, ASR adapter, TTS adapter) — the whole directory; integrations/ is now empty and dropped. - scripts/run-stackchan-sidecar.sh — the launcher for the same bridge. - "stackchan:run" script entry in package.json. If/when this lives again, it should be a separate companion repo so the support surface and external-project dependency are explicit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Public-release version bump. Six sites must move together; the previous bump (to 1.16.1) missed the SERVER_VERSION constant in mcp-server, leaving the MCP server advertising 1.16.0 while every other site was already 1.16.1. Bumped: - package.json: 1.16.1 -> 1.17.0 - mcp-server/dist/index.js SERVER_VERSION: 1.16.0 -> 1.17.0 (also clears the prior drift) - tts-worker/pyproject.toml: 1.16.1 -> 1.17.0 - asr-worker/pyproject.toml: 1.16.1 -> 1.17.0 - tts-worker/uv.lock + asr-worker/uv.lock: refreshed via `uv lock` Minor bump is appropriate: this release adds new MCP tools (agent.pane_snapshot, agent.pane_send_key, the helper stuck detector) plus the AtomS3R firmware end-to-end (face render, ES8311 audio, PTT, setup portal, IMU triple-tap reorient, USB serial provisioning, multi-Wi-Fi), the rmh-voice-mode example, and Antigravity CLI/GUI support in the operator stack. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This release adds two large feature areas plus public-release hygiene
work, and bumps the version to 1.17.0 across all six hardcoded sites.
AtomS3R hardware face frontend
End-to-end firmware support for the M5Stack AtomS3R + Atomic Echo Base
as a physical face for minimum-headroom:
event/tts_state/tts_mouthover WebSocket.~0.5 sarm + audible "ピッ" cue;records up to 8 s of 16 kHz mono PCM and submits it to face-app
operator ASR.
RMH-SETUP-xxxx) with captive portal.the device on.
RMHCFGline protocol +scripts/atoms3r-provision.mjstopush Wi-Fi/token/URLs from the PC without typing them into the portal.
physically turned, with serial-only on-device calibration.
scripts/atoms3r-http-bridge.mjs+scripts/ensure-atoms3r-bridge.sh) is auto-started by the operatorstack and runs in its own detached tmux session.
Helper stuck detection and pane control (new MCP tools)
Three pieces that let an operator (claude / codex / antigravity) recover
a helper stuck inside a CLI-level modal (tool approval, model picker,
usage-limit notice, feedback survey) before the helper's LLM reads any
input:
~5 s, matches known modal patterns, and posts an auto-generated
blockedreport into the owner inbox.agent.pane_snapshot— operator-callable read of the helper panetail with ANSI stripped.
agent.pane_send_key— operator-callable named-key / literal-textinjection into the helper pane (with an allowlist).
Detection never auto-presses keys; the operator always decides the
response. Disable with
MH_HELPER_STUCK_DETECTOR=off; tune cadence withMH_HELPER_STUCK_DETECTOR_INTERVAL_MS.Operator stack: Antigravity CLI + GUI migration
The operator stack and example agent config have been updated for
Antigravity CLI and GUI (replacing the previous Gemini-CLI assumption),
including a refreshed
doc/examples/antigravity/example and anexamples/rmh-voice-mode/voice-first launcher that idempotentlyregenerates per-CLI rule files from a single shared source.
TTS and audio path improvements
speaker stays responsive on long answers.
MH_TTS_CHUNK_MAX_CHARS=64) tuned for the AtomS3R.Public-release hygiene (final cleanup before tagging 1.17.0)
/home/amari1/...) from public docsand scripts; scripts now resolve
$HOME/os.homedir()and use thegeneric
~/.nvm/.../defaultsymlink.user-readable feature descriptions.
firmware/atoms3r-headroom/README.mdwith a Hardware sectionlinking the M5Stack product pages, a setup-portal section explaining
both auto and force entry, and a full Japanese mirror.
tts-worker/README.mdto actually describe the worker.not a minimum-headroom feature; it can live in a separate companion
repo if needed.
CLAUDE.mdand added a bootstrap loop inexamples/rmh-voice-mode/start-rmh.shthat auto-regenerates rulefiles from
voice-first-rules.md.Version bump
Six hardcoded sites moved together to 1.17.0:
package.json,mcp-server/dist/index.jsSERVER_VERSION (was 1.16.0 —fixed the prior drift),
tts-worker/pyproject.toml,asr-worker/pyproject.toml,tts-worker/uv.lock,asr-worker/uv.lock.Test plan
npm test— 357/357 pass locally.portal force-entry) verified on hardware.
agent.pane_snapshot/agent.pane_send_keyagainst a realclaude / codex / antigravity helper.
🤖 Generated with Claude Code