Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@ flowchart LR
MCP[MCP Server<br/>face_event / face_say / face_ping]
WS[face-app<br/>WebSocket + HTTP :8765]
FE[Frontend UI<br/>Browser]
ATOM[AtomS3R Device<br/>2D face LCD + Echo speaker + PTT mic]
ATOMBR[atoms3r-http-bridge]
BR[operator-bridge]
ASRP[/POST /api/operator/asr/]
ASR[asr-worker<br/>Parakeet ASR<br/>JA/EN]
Expand All @@ -73,13 +75,18 @@ flowchart LR
U -- Direct prompt --> TMUX
U -- PTT recording --> FE
U -- Text input --> FE
U -- PTT button + voice --> ATOM
ATOM -- 2D face + Echo audio --> U

FE -- Audio binary --> ASRP
ATOM -- Mic WAV (POST /api/operator/asr) --> ASRP
ASRP -- JSON (audioBase64,mimeType,lang) --> ASR
ASR -- JSON transcript --> ASRP
ASRP -- Transcript --> FE
ASRP -- Transcript --> ATOM

FE -- operator_response JSON --> WS
ATOM -- operator_response (POST /api/operator/response) --> WS
WS -- relay --> BR
BR -- tmux send-keys --> TMUX
TMUX --> C
Expand All @@ -96,6 +103,10 @@ flowchart LR
WS -- say payload --> TTS
TTS -- audio + tts state --> FE

WS -- face/tts payloads (WS) --> ATOMBR
ATOMBR -- POST /api/headroom/payload --> ATOM
ATOMBR -- POST /api/headroom/audio --> ATOM

FE <-- HTTPS/WS --> TS
TS <---> WS
```
Expand All @@ -108,6 +119,8 @@ sequenceDiagram
participant U as User
participant TS as Tailscale (optional)
participant FE as Frontend UI
participant ATOM as AtomS3R Device
participant ATOMBR as atoms3r-http-bridge
participant FA as face-app (:8765, /ws, /api/operator/asr)
participant ASR as asr-worker (Parakeet)
participant BR as operator-bridge
Expand All @@ -123,6 +136,7 @@ sequenceDiagram

FE->>FA: Connect WebSocket /ws
BR->>FA: Connect WebSocket /ws
ATOMBR->>FA: Connect WebSocket /ws

alt Input path A: direct terminal prompt
U->>TM: Type prompt
Expand All @@ -144,6 +158,16 @@ sequenceDiagram
FA-->>BR: Relay payload
BR->>TM: tmux send-keys(text + Enter)
TM->>CX: Prompt arrives
else Input path D: AtomS3R PTT
U->>ATOM: Hold PTT button
ATOM->>FA: POST /api/operator/asr?lang=ja|en (WAV)
FA->>ASR: /v1/asr/ja|en (audioBase64,mimeType)
ASR-->>FA: Transcript JSON
FA-->>ATOM: Transcript response
ATOM->>FA: POST /api/operator/response (text)
FA-->>BR: Relay payload
BR->>TM: tmux send-keys(text + Enter)
TM->>CX: Prompt arrives
end

loop During work
Expand All @@ -156,10 +180,15 @@ sequenceDiagram
CX->>MCP: face_event / face_say / face_ping
MCP->>FA: Forward WebSocket JSON
FA-->>FE: event/say/state payloads
FA-->>ATOMBR: event/say/state payloads (WS)
ATOMBR->>ATOM: POST /api/headroom/payload

FA->>TTS: TTS request
TTS-->>FA: tts_audio / tts_mouth / say_result
FA-->>FE: Realtime status + audio
FA-->>ATOMBR: tts_audio / tts_mouth (WS)
ATOMBR->>ATOM: POST /api/headroom/audio + /payload
ATOM-->>U: 2D face on LCD + Echo speaker
FE-->>U: Voice, facial state, and status updates
```

Expand Down Expand Up @@ -535,6 +564,8 @@ flowchart LR
MCP[MCP サーバー<br/>face_event / face_say / face_ping]
WS[face-app<br/>WebSocket + HTTP :8765]
FE[フロントエンド UI<br/>ブラウザ]
ATOM[AtomS3R 端末<br/>2D顔 LCD + Echoスピーカ + PTTマイク]
ATOMBR[atoms3r-http-bridge]
BR[operator-bridge]
ASRP[/POST /api/operator/asr/]
ASR[asr-worker<br/>Parakeet ASR<br/>JA/EN]
Expand All @@ -544,13 +575,18 @@ flowchart LR
U -- 直接プロンプト --> TMUX
U -- PTT録音 --> FE
U -- テキスト入力 --> FE
U -- PTTボタン + 発話 --> ATOM
ATOM -- 2D顔 + Echo音声 --> U

FE -- 音声バイナリ --> ASRP
ATOM -- マイクWAV (POST /api/operator/asr) --> ASRP
ASRP -- JSON (audioBase64,mimeType,lang) --> ASR
ASR -- 文字起こしJSON --> ASRP
ASRP -- 文字起こし結果 --> FE
ASRP -- 文字起こし結果 --> ATOM

FE -- operator_response JSON --> WS
ATOM -- operator_response (POST /api/operator/response) --> WS
WS -- relay --> BR
BR -- tmux send-keys --> TMUX
TMUX --> C
Expand All @@ -567,6 +603,10 @@ flowchart LR
WS -- say payload --> TTS
TTS -- audio + tts state --> FE

WS -- face/tts payloads (WS) --> ATOMBR
ATOMBR -- POST /api/headroom/payload --> ATOM
ATOMBR -- POST /api/headroom/audio --> ATOM

FE <-- HTTPS/WS --> TS
TS <---> WS
```
Expand All @@ -579,6 +619,8 @@ sequenceDiagram
participant U as ユーザー
participant TS as Tailscale (任意)
participant FE as Frontend UI
participant ATOM as AtomS3R 端末
participant ATOMBR as atoms3r-http-bridge
participant FA as face-app (:8765, /ws, /api/operator/asr)
participant ASR as asr-worker (Parakeet)
participant BR as operator-bridge
Expand All @@ -594,6 +636,7 @@ sequenceDiagram

FE->>FA: WebSocket /ws 接続
BR->>FA: WebSocket /ws 接続
ATOMBR->>FA: WebSocket /ws 接続

alt 入力経路A: 端末直接入力
U->>TM: プロンプトを入力
Expand All @@ -615,6 +658,16 @@ sequenceDiagram
FA-->>BR: payload relay
BR->>TM: tmux send-keys(text + Enter)
TM->>CX: プロンプト到達
else 入力経路D: AtomS3R PTT
U->>ATOM: PTTボタンを押下
ATOM->>FA: POST /api/operator/asr?lang=ja|en (WAV)
FA->>ASR: /v1/asr/ja|en (audioBase64,mimeType)
ASR-->>FA: 文字起こしJSON
FA-->>ATOM: 文字起こし結果
ATOM->>FA: POST /api/operator/response (text)
FA-->>BR: payload relay
BR->>TM: tmux send-keys(text + Enter)
TM->>CX: プロンプト到達
end

loop 作業中
Expand All @@ -627,10 +680,15 @@ sequenceDiagram
CX->>MCP: face_event / face_say / face_ping
MCP->>FA: WebSocket JSON転送
FA-->>FE: event/say/state payloads
FA-->>ATOMBR: event/say/state payloads (WS)
ATOMBR->>ATOM: POST /api/headroom/payload

FA->>TTS: TTS request
TTS-->>FA: tts_audio / tts_mouth / say_result
FA-->>FE: リアルタイム状態 + 音声
FA-->>ATOMBR: tts_audio / tts_mouth (WS)
ATOMBR->>ATOM: POST /api/headroom/audio + /payload
ATOM-->>U: 2D顔 (LCD) + Echoスピーカ
FE-->>U: 音声・表情・状態を表示
```

Expand Down
2 changes: 1 addition & 1 deletion asr-worker/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "asr-worker"
version = "1.17.3"
version = "1.17.4"
description = "Local ASR worker for english-trainer (Parakeet EN/JA routing)"
readme = "README.md"
requires-python = ">=3.10"
Expand Down
2 changes: 1 addition & 1 deletion asr-worker/uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

11 changes: 11 additions & 0 deletions doc/diagrams/high-level-flow.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ flowchart LR
MCP["MCP Server\nface_event / face_say / face_ping"]
WS["face-app\nWebSocket + HTTP :8765"]
FE["Frontend UI\nBrowser"]
ATOM["AtomS3R Device\n2D face LCD + Echo speaker + PTT mic"]
ATOMBR[atoms3r-http-bridge]
BR[operator-bridge]
ASRP[/POST /api/operator/asr/]
ASR["asr-worker\nParakeet ASR\nJA/EN"]
Expand All @@ -14,13 +16,18 @@ flowchart LR
U -- Direct prompt --> TMUX
U -- PTT recording --> FE
U -- Text input --> FE
U -- PTT button + voice --> ATOM
ATOM -- 2D face + Echo audio --> U

FE -- Audio binary --> ASRP
ATOM -- Mic WAV (POST /api/operator/asr) --> ASRP
ASRP -- JSON (audioBase64,mimeType,lang) --> ASR
ASR -- JSON transcript --> ASRP
ASRP -- Transcript --> FE
ASRP -- Transcript --> ATOM

FE -- operator_response JSON --> WS
ATOM -- operator_response (POST /api/operator/response) --> WS
WS -- relay --> BR
BR -- tmux send-keys --> TMUX
TMUX --> C
Expand All @@ -37,5 +44,9 @@ flowchart LR
WS -- say payload --> TTS
TTS -- audio + tts state --> FE

WS -- face/tts payloads (WS) --> ATOMBR
ATOMBR -- POST /api/headroom/payload --> ATOM
ATOMBR -- POST /api/headroom/audio --> ATOM

FE <-- HTTPS/WS --> TS
TS <---> WS
Binary file modified doc/diagrams/high-level-flow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion doc/diagrams/high-level-flow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
18 changes: 18 additions & 0 deletions doc/diagrams/sequence-timeline.mmd
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ sequenceDiagram
participant U as User
participant TS as Tailscale (optional)
participant FE as Frontend UI
participant ATOM as AtomS3R Device
participant ATOMBR as atoms3r-http-bridge
participant FA as face-app (:8765, /ws, /api/operator/asr)
participant ASR as asr-worker (Parakeet)
participant BR as operator-bridge
Expand All @@ -18,6 +20,7 @@ sequenceDiagram

FE->>FA: Connect WebSocket /ws
BR->>FA: Connect WebSocket /ws
ATOMBR->>FA: Connect WebSocket /ws

alt Input path A: direct terminal prompt
U->>TM: Type prompt
Expand All @@ -39,6 +42,16 @@ sequenceDiagram
FA-->>BR: Relay payload
BR->>TM: tmux send-keys(text + Enter)
TM->>CX: Prompt arrives
else Input path D: AtomS3R PTT
U->>ATOM: Hold PTT button
ATOM->>FA: POST /api/operator/asr?lang=ja|en (WAV)
FA->>ASR: /v1/asr/ja|en (audioBase64,mimeType)
ASR-->>FA: Transcript JSON
FA-->>ATOM: Transcript response
ATOM->>FA: POST /api/operator/response (text)
FA-->>BR: Relay payload
BR->>TM: tmux send-keys(text + Enter)
TM->>CX: Prompt arrives
end

loop During work
Expand All @@ -51,8 +64,13 @@ sequenceDiagram
CX->>MCP: face_event / face_say / face_ping
MCP->>FA: Forward WebSocket JSON
FA-->>FE: event/say/state payloads
FA-->>ATOMBR: event/say/state payloads (WS)
ATOMBR->>ATOM: POST /api/headroom/payload

FA->>TTS: TTS request
TTS-->>FA: tts_audio / tts_mouth / say_result
FA-->>FE: Realtime status + audio
FA-->>ATOMBR: tts_audio / tts_mouth (WS)
ATOMBR->>ATOM: POST /api/headroom/audio + /payload
ATOM-->>U: 2D face on LCD + Echo speaker
FE-->>U: Voice, facial state, and status updates
Binary file modified doc/diagrams/sequence-timeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion doc/diagrams/sequence-timeline.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion mcp-server/dist/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import { randomUUID } from 'node:crypto';
import { createFramedMessageParser, writeMessage } from './mcp_stdio.js';

const SERVER_NAME = 'minimum-headroom';
const SERVER_VERSION = '1.17.3';
const SERVER_VERSION = '1.17.4';
const PROTOCOL_VERSION = '2024-11-05';
const FACE_WS_URL = process.env.FACE_WS_URL ?? 'ws://127.0.0.1:8765/ws';
const FACE_AUTH_TOKEN = (() => {
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "minimum-headroom",
"version": "1.17.3",
"version": "1.17.4",
"private": true,
"type": "module",
"scripts": {
Expand Down
2 changes: 1 addition & 1 deletion tts-worker/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "minimum-headroom-tts-worker"
version = "1.17.3"
version = "1.17.4"
description = "Minimum Headroom TTS worker (Kokoro ONNX default, optional Qwen3-TTS)"
readme = "README.md"
requires-python = ">=3.12"
Expand Down
2 changes: 1 addition & 1 deletion tts-worker/uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading