AtomS3R long-reply TTS: server chunking + PTT barge-in (+ face tuning, codex PTT preserved)#52
Closed
amariichi wants to merge 9 commits into
Closed
AtomS3R long-reply TTS: server chunking + PTT barge-in (+ face tuning, codex PTT preserved)#52amariichi wants to merge 9 commits into
amariichi wants to merge 9 commits into
Conversation
Hardware-validated AtomS3R frontend work: - firmware: AtomS3R has no internal speaker; use Atomic Echo Base (cfg.internal_spk=false, external_speaker.atomic_echo=true), set speaker volume 130 - face_renderer: success face uses normal eyes + brown raised "\/" brows + closed-mouth smile arc (口パク preserved while speaking); blink is a dark downward-convex eyelid arc with randomized 3/4/5 s scheduling (1-in-10 quick 0.3 s re-blink); Thinking sweeps the pupils left/right - headroom_transport: Failed background persists until the next state event (like Permission) instead of auto-reverting after 8 s - scripts: default restart-operator-stack-in-place.sh to FACE_AUDIO_TARGET=both so the PC->Atom bridge receives tts_audio; add atoms3r-http-bridge.mjs and the stackchan-minimal sidecar Local commit only; not pushed. ExecPlan PLANS_48 updated on disk (.agent/ is gitignored by repo design, so it is not in this commit). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Closed "∪" eyelid: larger radius (15->19) with a narrower sweep (25..155 -> 45..135) for a gentler, flatter curve, dropped a touch lower. - Shift the whole composed face down by kFaceOffsetY (4px) so the head looks vertically centered despite the visual weight of the hair. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add tts_audio_store: TTL-bounded server-side store of generated TTS audio, exposed over HTTP with a lightweight WS reference payload and WAV-duration parsing. - tts_controller: always stash audio and broadcast a reference; only broadcast the base64 body when browser audio is enabled (previously audio was dropped entirely when browser audio was off), so the AtomS3R/Echo Base bridge and Stack-chan sidecar can fetch it by URL. - index.js: wire the store (MH_TTS_AUDIO_REF_TTL_MS, default 60s) into the HTTP router and controller; clear it on worker stop. - package.json: add stackchan:run and atoms3r:bridge scripts. - .gitignore: ignore .venv-platformio/ and esp-web-tools-logs.txt. - Tests for the store (reference metadata, TTL expiry, WAV duration) and the controller reference path. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…fix)
Long agent replies exceeded the AtomS3R firmware's per-utterance
base64/HTTP WAV cap, so the audio was dropped while the mouth kept
animating from the independent tts_mouth stream ("mouth-only, no
sound" on long local-LLM output).
- Add segmentTtsText: JA/EN hard sentence boundaries, late comma soft
split, greedy packing, default 120 chars (MH_TTS_CHUNK_MAX_CHARS).
Text <= limit is returned verbatim (unchanged single-chunk path).
- Replace the single `pending` slot with a real ordered FIFO queue;
one logical utterance occupies it, a newer say flushes the
remainder, interrupt/auto-interrupt/stop clear the whole queue.
- Each chunk is its own worker `speak` with the parent generation and
a #k/N utterance/message suffix, dispatched sequentially on
play_stop, keeping every WAV under the Atom size cap.
- Tests: segmentTtsText units + sequential-dispatch + interrupt-flush;
full node --test suite green (333).
Step 1 of 3 (Step 2: PTT-clear wiring; Step 3: firmware playback queue).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the operator takes the turn, queued/active agent speech should stop instead of talking over the just-spoken input. - tts_controller: add flushForBargeIn(reason) — clear the chunk FIFO, interrupt + release the active chunk, emit play_stop, advance the generation so late worker audio/mouth for the old utterance is ignored, and clear the audio store so a memory-constrained sink cannot pull a stale chunk. - operator_asr_proxy: new onBargeIn option, invoked as soon as a POST /api/operator/asr audio upload arrives — the earliest cross-transport "user took the turn" signal (the Atom posts here too; it has no usable WebSocket client). Handler errors are caught so ASR still proceeds. - index.js: wire onBargeIn -> ttsController.flushForBargeIn. - Tests: controller flush behavior + audio-store clear; proxy onBargeIn invocation, non-ASR negative, and throw-safety. Full node --test green (338). Step 2 of 3 (Step 3: AtomS3R firmware playback queue + stop-on-PTT). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Preservation commit of codex's in-flight, build-passing, hardware-validated PTT firmware (Milestone 6 in PLANS_48): button hold-to-record, M5.Mic capture via the Atomic Echo Base, 16 kHz mono WAV wrapping, POST to /api/operator/asr?lang=<asrLanguage>, and operator_response submission over the authenticated HTTP fallback. Includes persisted asrLanguage (ja/en) with setup-portal selection, HeadroomTransport::sendOperatorText(), the HeadroomPtt record/process/submit module, longer ASR HTTP read timeout, and ingress/settings/config wiring. `pio run` succeeds (RAM 15.6%, Flash 36.2%); transcript verified on real hardware per PLANS_48. Committed by Claude to preserve the work while codex is paused; no behavioral changes were made to codex's firmware in this commit. Co-Authored-By: Codex (OpenAI) <noreply@openai.com> Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Server-side sentence chunking delivers an ordered burst of small WAV refs. The firmware played each one via playOwnedWav, which calls M5.Speaker.stop() first, so a newly arrived chunk truncated the one still playing (choppy / only the last chunk audible). - headroom_audio: add a bounded FIFO (kMaxQueued=8). playBase64Wav / playHttpWavRef / playWavBytes route through playOrEnqueue: play immediately when idle, else enqueue the owned buffer. loop() starts the next queued chunk when the speaker goes idle. - stop()/stopForRecording() (the PTT-press path) clear the queue and free buffers, so buffered agent speech does not resume after the user takes the turn. - busy() stays true while chunks are queued so the face holds the Speaking expression across inter-chunk gaps instead of flickering. - headroom_transport: zero mouthOpen when a chunk's audio is dropped, killing phantom 口パク from the independent tts_mouth stream. Built (RAM 15.7%, Flash 36.3%) and flashed to the real AtomS3R. Step 3 of 3. Completes the long-TTS chunking + PTT-clear feature (Step 1 2d26c1c, Step 2 882889e). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Field bug after the chunking work: long replies were still mouth-only with a burst of static at the end. The atoms3r bridge log showed the Atom rejecting ~800 KB WAVs with HTTP 413 payload_too_large; the rare small sentence slipped through and a near-cap one played truncated (the static). The 120-char chunk default was never calibrated to the Atom HTTP ingress cap (~250 KB accepted in practice), and one Hermes sentence (<=120 chars) was not split at all. restart-operator-stack-in-place.sh now exports MH_TTS_CHUNK_MAX_CHARS (default 24 ~= ~3 s ~= ~150 KB WAV) into the stack so the Atom-facing pipeline chunks small enough to be accepted. The global code default stays 120 for browser/PC. Verified live: long utterance now splits into ~110-165 KB WAV chunks, all forwarded with no 413. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Once server-side chunking made WAVs small enough to be accepted (no
more 413), every chunk played as loud static with faint voice. The
regression was introduced solely by the Step 3 FIFO; the validated
Milestone 5 single-play path was clean. Serial showed no firmware
rejection, so the WAV was accepted and "played" but corrupted -
consistent with a buffer-lifetime/scheduling bug around the async
M5.Speaker.playWav in the queue.
Restore headroom_audio.{cpp,h} and headroom_transport.cpp to the
validated 2579dcc state, rebuilt and reflashed. Server-side chunking
(Steps 1/2) and the small MH_TTS_CHUNK_MAX_CHARS budget are kept, so
the Atom receives small WAVs played one-at-a-time by the known-good
path. A correct Atom playback queue is deferred to an isolated rework
with on-device serial validation (see PLANS_48).
This reverts the firmware portion of ed6f4bf only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the AtomS3R "long agent reply = mouth-only / loud static" problem and tunes the face, on top of the preserved codex first-pass PTT firmware.
∪arc, whole-face recentred.tts_audio_store) for non-browser sinks.segmentTtsTextsplits long utterances into ordered sentence-bounded chunks played through a real FIFO; short text unchanged.MH_TTS_CHUNK_MAX_CHARS(operator stack defaults to 24 so each WAV stays under the Atom HTTP ingress cap ~250 KB; 413 otherwise).2579dcc, hardware-validated Milestone 6) as its own commit.5b0b3ef): it corrupted playback (loud static); restored to the validated single-play path. Server chunking + small budget keep audio clean on hardware (user-confirmed).Status
node --testgreen (338).face_say.Coordination
codex is paused (~3h) and its in-flight firmware was committed here unchanged; on resume it should pull/rebase since the base advances.
🤖 Generated with Claude Code