AtomS3R long-reply TTS: server chunking + PTT barge-in (+ face tuning, codex PTT preserved) by amariichi · Pull Request #52 · amariichi/MinimumHeadroom

amariichi · 2026-05-17T02:00:14Z

Summary

Fixes the AtomS3R "long agent reply = mouth-only / loud static" problem and tunes the face, on top of the preserved codex first-pass PTT firmware.

face_renderer: flatter closed-eye ∪ arc, whole-face recentred.
TTS audio HTTP reference store (tts_audio_store) for non-browser sinks.
Step 1 — server chunking: segmentTtsText splits long utterances into ordered sentence-bounded chunks played through a real FIFO; short text unchanged. MH_TTS_CHUNK_MAX_CHARS (operator stack defaults to 24 so each WAV stays under the Atom HTTP ingress cap ~250 KB; 413 otherwise).
Step 2 — PTT barge-in: operator ASR upload flushes the chunk queue, interrupts the active chunk, bumps generation, clears the audio store.
codex PTT firmware preserved (2579dcc, hardware-validated Milestone 6) as its own commit.
Step 3 (Atom-side FIFO) reverted (5b0b3ef): it corrupted playback (loud static); restored to the validated single-play path. Server chunking + small budget keep audio clean on hardware (user-confirmed).

Status

face-app: full node --test green (338).
Firmware: builds, flashed, user-confirmed clean audio on the real AtomS3R.
Known follow-ups (non-blocking, tracked in PLANS_48): proper Atom playback queue rework with on-device serial validation; same-agent sequential append vs replace for rapid multi-sentence face_say.

Coordination

codex is paused (~3h) and its in-flight firmware was committed here unchanged; on resume it should pull/rebase since the base advances.

🤖 Generated with Claude Code

Hardware-validated AtomS3R frontend work: - firmware: AtomS3R has no internal speaker; use Atomic Echo Base (cfg.internal_spk=false, external_speaker.atomic_echo=true), set speaker volume 130 - face_renderer: success face uses normal eyes + brown raised "\/" brows + closed-mouth smile arc (口パク preserved while speaking); blink is a dark downward-convex eyelid arc with randomized 3/4/5 s scheduling (1-in-10 quick 0.3 s re-blink); Thinking sweeps the pupils left/right - headroom_transport: Failed background persists until the next state event (like Permission) instead of auto-reverting after 8 s - scripts: default restart-operator-stack-in-place.sh to FACE_AUDIO_TARGET=both so the PC->Atom bridge receives tts_audio; add atoms3r-http-bridge.mjs and the stackchan-minimal sidecar Local commit only; not pushed. ExecPlan PLANS_48 updated on disk (.agent/ is gitignored by repo design, so it is not in this commit). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Closed "∪" eyelid: larger radius (15->19) with a narrower sweep (25..155 -> 45..135) for a gentler, flatter curve, dropped a touch lower. - Shift the whole composed face down by kFaceOffsetY (4px) so the head looks vertically centered despite the visual weight of the hair. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Add tts_audio_store: TTL-bounded server-side store of generated TTS audio, exposed over HTTP with a lightweight WS reference payload and WAV-duration parsing. - tts_controller: always stash audio and broadcast a reference; only broadcast the base64 body when browser audio is enabled (previously audio was dropped entirely when browser audio was off), so the AtomS3R/Echo Base bridge and Stack-chan sidecar can fetch it by URL. - index.js: wire the store (MH_TTS_AUDIO_REF_TTL_MS, default 60s) into the HTTP router and controller; clear it on worker stop. - package.json: add stackchan:run and atoms3r:bridge scripts. - .gitignore: ignore .venv-platformio/ and esp-web-tools-logs.txt. - Tests for the store (reference metadata, TTL expiry, WAV duration) and the controller reference path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…fix) Long agent replies exceeded the AtomS3R firmware's per-utterance base64/HTTP WAV cap, so the audio was dropped while the mouth kept animating from the independent tts_mouth stream ("mouth-only, no sound" on long local-LLM output). - Add segmentTtsText: JA/EN hard sentence boundaries, late comma soft split, greedy packing, default 120 chars (MH_TTS_CHUNK_MAX_CHARS). Text <= limit is returned verbatim (unchanged single-chunk path). - Replace the single `pending` slot with a real ordered FIFO queue; one logical utterance occupies it, a newer say flushes the remainder, interrupt/auto-interrupt/stop clear the whole queue. - Each chunk is its own worker `speak` with the parent generation and a #k/N utterance/message suffix, dispatched sequentially on play_stop, keeping every WAV under the Atom size cap. - Tests: segmentTtsText units + sequential-dispatch + interrupt-flush; full node --test suite green (333). Step 1 of 3 (Step 2: PTT-clear wiring; Step 3: firmware playback queue). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When the operator takes the turn, queued/active agent speech should stop instead of talking over the just-spoken input. - tts_controller: add flushForBargeIn(reason) — clear the chunk FIFO, interrupt + release the active chunk, emit play_stop, advance the generation so late worker audio/mouth for the old utterance is ignored, and clear the audio store so a memory-constrained sink cannot pull a stale chunk. - operator_asr_proxy: new onBargeIn option, invoked as soon as a POST /api/operator/asr audio upload arrives — the earliest cross-transport "user took the turn" signal (the Atom posts here too; it has no usable WebSocket client). Handler errors are caught so ASR still proceeds. - index.js: wire onBargeIn -> ttsController.flushForBargeIn. - Tests: controller flush behavior + audio-store clear; proxy onBargeIn invocation, non-ASR negative, and throw-safety. Full node --test green (338). Step 2 of 3 (Step 3: AtomS3R firmware playback queue + stop-on-PTT). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Preservation commit of codex's in-flight, build-passing, hardware-validated PTT firmware (Milestone 6 in PLANS_48): button hold-to-record, M5.Mic capture via the Atomic Echo Base, 16 kHz mono WAV wrapping, POST to /api/operator/asr?lang=<asrLanguage>, and operator_response submission over the authenticated HTTP fallback. Includes persisted asrLanguage (ja/en) with setup-portal selection, HeadroomTransport::sendOperatorText(), the HeadroomPtt record/process/submit module, longer ASR HTTP read timeout, and ingress/settings/config wiring. `pio run` succeeds (RAM 15.6%, Flash 36.2%); transcript verified on real hardware per PLANS_48. Committed by Claude to preserve the work while codex is paused; no behavioral changes were made to codex's firmware in this commit. Co-Authored-By: Codex (OpenAI) <noreply@openai.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Server-side sentence chunking delivers an ordered burst of small WAV refs. The firmware played each one via playOwnedWav, which calls M5.Speaker.stop() first, so a newly arrived chunk truncated the one still playing (choppy / only the last chunk audible). - headroom_audio: add a bounded FIFO (kMaxQueued=8). playBase64Wav / playHttpWavRef / playWavBytes route through playOrEnqueue: play immediately when idle, else enqueue the owned buffer. loop() starts the next queued chunk when the speaker goes idle. - stop()/stopForRecording() (the PTT-press path) clear the queue and free buffers, so buffered agent speech does not resume after the user takes the turn. - busy() stays true while chunks are queued so the face holds the Speaking expression across inter-chunk gaps instead of flickering. - headroom_transport: zero mouthOpen when a chunk's audio is dropped, killing phantom 口パク from the independent tts_mouth stream. Built (RAM 15.7%, Flash 36.3%) and flashed to the real AtomS3R. Step 3 of 3. Completes the long-TTS chunking + PTT-clear feature (Step 1 2d26c1c, Step 2 882889e). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Field bug after the chunking work: long replies were still mouth-only with a burst of static at the end. The atoms3r bridge log showed the Atom rejecting ~800 KB WAVs with HTTP 413 payload_too_large; the rare small sentence slipped through and a near-cap one played truncated (the static). The 120-char chunk default was never calibrated to the Atom HTTP ingress cap (~250 KB accepted in practice), and one Hermes sentence (<=120 chars) was not split at all. restart-operator-stack-in-place.sh now exports MH_TTS_CHUNK_MAX_CHARS (default 24 ~= ~3 s ~= ~150 KB WAV) into the stack so the Atom-facing pipeline chunks small enough to be accepted. The global code default stays 120 for browser/PC. Verified live: long utterance now splits into ~110-165 KB WAV chunks, all forwarded with no 413. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Once server-side chunking made WAVs small enough to be accepted (no more 413), every chunk played as loud static with faint voice. The regression was introduced solely by the Step 3 FIFO; the validated Milestone 5 single-play path was clean. Serial showed no firmware rejection, so the WAV was accepted and "played" but corrupted - consistent with a buffer-lifetime/scheduling bug around the async M5.Speaker.playWav in the queue. Restore headroom_audio.{cpp,h} and headroom_transport.cpp to the validated 2579dcc state, rebuilt and reflashed. Server-side chunking (Steps 1/2) and the small MH_TTS_CHUNK_MAX_CHARS budget are kept, so the Atom receives small WAVs played one-at-a-time by the known-good path. A correct Atom playback queue is deferred to an isolated rework with on-device serial validation (see PLANS_48). This reverts the firmware portion of ed6f4bf only. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

amariichi and others added 9 commits May 16, 2026 23:13

amariichi closed this May 17, 2026

amariichi deleted the atoms3r-face-audio-tuning branch May 17, 2026 02:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AtomS3R long-reply TTS: server chunking + PTT barge-in (+ face tuning, codex PTT preserved)#52

AtomS3R long-reply TTS: server chunking + PTT barge-in (+ face tuning, codex PTT preserved)#52
amariichi wants to merge 9 commits into
mainfrom
atoms3r-face-audio-tuning

amariichi commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amariichi commented May 17, 2026

Summary

Status

Coordination

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant