From d763d1ab6bc2f20b36ae9373e88a880f46dee2d6 Mon Sep 17 00:00:00 2001 From: "google-labs-jules[bot]" <161369871+google-labs-jules[bot]@users.noreply.github.com> Date: Fri, 10 Apr 2026 08:14:43 +0000 Subject: [PATCH] docs: Sync ROADMAP.md, README.md, and INSTALL guides with recent code changes Updates documentation files to reflect the latest codebase state: - ROADMAP.md: Moves completed short/medium tasks (gpt-4o-mini switch, max_tokens reduction, Groq LLaMA 3 support) to the "Completed" section. - ROADMAP.md: Updates latency breakdown to reflect 1.0s `speech_timeout` and modifies related impact/descriptions. - ROADMAP.md: Adjusts the short-term goal to pre-warm cache for 13 languages instead of 8. - README.md: Updates Mermaid diagram and labels to reflect 13+ languages and 54 phone prefix detections instead of 11+ languages and 52 prefixes. - docs/INSTALL_EN.md & docs/INSTALL_PL.md: Updates system architecture diagrams from 11+ languages to 13+ languages. --- README.md | 4 ++-- ROADMAP.md | 33 ++++++++------------------------- docs/INSTALL_EN.md | 2 +- docs/INSTALL_PL.md | 2 +- 4 files changed, 12 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index 6405c27..6a785c2 100644 --- a/README.md +++ b/README.md @@ -26,7 +26,7 @@ graph TB TTS["tts.py
ElevenLabs → OpenAI
→ Polly (fallback)
Cache (MD5)
Circuit breaker"] Owner["owner_channel.py
Signal notify
Signal poll (3s)
Slash commands
Instructions"] Contact["contact_lookup.py
contacts.json
Twilio CNAM
E.164 normalize
Lang from prefix"] - I18n["i18n.py
11+ languages
Signal templates
Polly voices
Twilio codes"] + I18n["i18n.py
13+ languages
Signal templates
Polly voices
Twilio codes"] end SignalCLI["signal-cli :8080
REST API
Native mode
Self-hosted"] @@ -185,7 +185,7 @@ sequenceDiagram ```mermaid flowchart TD - Start([CALL START]) --> Prefix["Phone prefix detection
+41 → de-CH
+48 → pl-PL
+44 → en-GB
(52 prefixes)"] + Start([CALL START]) --> Prefix["Phone prefix detection
+41 → de-CH
+48 → pl-PL
+44 → en-GB
(54 prefixes)"] Prefix --> ContactCheck{Contact has
lang override?} ContactCheck -->|Yes| ContactLang["Use contact language
contacts.json
e.g. {name: ..., lang: pl}"] diff --git a/ROADMAP.md b/ROADMAP.md index 1e42eef..4642d50 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -8,10 +8,13 @@ - [x] Core call flow: Twilio STT → GPT-4o → ElevenLabs TTS - [x] Signal integration: notifications, live updates, owner instructions mid-call -- [x] Multilingual support: 8+ languages, auto-detection, mid-call switching +- [x] Multilingual support: 13+ languages, auto-detection, mid-call switching - [x] Maya persona with full OWNER_CONTEXT customization - [x] TTS provider chain with circuit breaker (ElevenLabs → OpenAI → Polly) - [x] TTS disk cache (MD5-keyed, persistent) +- [x] Switch to gpt-4o-mini for conversation +- [x] Reduce max_tokens and system prompt +- [x] Groq LLaMA 3 support - [x] Contact lookup (local JSON + Twilio CNAM) - [x] Streaming GPT-4o with first-sentence TTS pipelining - [x] Twilio signature validation + rate limiting @@ -19,7 +22,7 @@ - [x] Call recording (on/off via Signal `/recording-on`) - [x] API usage & cost tracking per call (GPT tokens, TTS chars, estimated cost) - [x] `/stats` with session costs and per-call cost history -- [x] `speech_timeout` reduced to 2s (from 5s) +- [x] `speech_timeout` reduced to 1s (from 5s) --- @@ -31,20 +34,8 @@ _Nothing currently in progress._ ## Short-term (quick wins) -### Switch to gpt-4o-mini for conversation -- [ ] Set `OPENAI_MODEL=gpt-4o-mini` (env var, no code changes) -- [ ] Test quality for Polish/German/English phone conversations -- [ ] Benchmark latency improvement (~0.5-0.8s vs ~2s for gpt-4o) -- [ ] Keep gpt-4o for summarization only (separate model config) -- **Impact**: ~1.2s latency reduction, 10x cheaper tokens - -### Reduce max_tokens and system prompt -- [ ] Lower `max_tokens` from 350 to 150 (2-3 sentences is enough) -- [ ] Trim system prompt (remove redundant rules, shorten examples) -- **Impact**: ~0.3-0.5s latency reduction - ### Pre-warm TTS cache on startup -- [ ] Generate and cache all greetings (8 languages) at boot +- [ ] Generate and cache all greetings (13 languages) at boot - [ ] Cache all clarification phrases and no-input prompts - **Impact**: eliminates 1-2s TTS delay on first call per language @@ -52,14 +43,6 @@ _Nothing currently in progress._ ## Medium-term (new capabilities) -### Groq LLaMA 3 support -- [ ] Add Groq as alternative LLM backend (OpenAI-compatible API) -- [ ] Make LLM provider configurable via env var (`LLM_PROVIDER=openai|groq`) -- [ ] Test Polish/German quality on LLaMA 3 70B/405B -- [ ] Benchmark: expected ~0.2-0.3s inference time -- **Impact**: fastest possible LLM response, free tier available -- **Risk**: weaker multilingual support vs GPT - ### Configurable speech_timeout - [ ] Make `speech_timeout` configurable via env var - [ ] Consider per-language tuning (some languages have longer pauses) @@ -102,12 +85,12 @@ Current response time breakdown (user stops speaking → hears response): | Stage | Current | With gpt-4o-mini | With Groq | With Realtime API | |-------|---------|-------------------|-----------|-------------------| -| speech_timeout | 2.0s | 2.0s | 2.0s | N/A | +| speech_timeout | 1.0s | 1.0s | 1.0s | N/A | | Twilio STT | 0.5s | 0.5s | 0.5s | N/A | | LLM inference | 2.0s | 0.7s | 0.3s | — | | TTS generation | 1.5s | 1.5s | 1.5s | — | | Network/playback | 0.5s | 0.5s | 0.5s | — | -| **Total** | **~6.5s** | **~5.2s** | **~4.8s** | **~1-2s** | +| **Total** | **~5.5s** | **~4.2s** | **~3.8s** | **~1-2s** | _With complementary optimizations (shorter prompt, lower max_tokens, pre-warm cache): subtract ~0.5-1s._ diff --git a/docs/INSTALL_EN.md b/docs/INSTALL_EN.md index e746913..e82b9d7 100644 --- a/docs/INSTALL_EN.md +++ b/docs/INSTALL_EN.md @@ -657,7 +657,7 @@ AVA includes the following security mechanisms: │ │ │ │ │ │ │ owner_channel.py ─── contact_lookup.py ─── i18n.py │ │ │ │ │ │ │ │ -│ │ Signal notify contacts.json 11+ langs │ +│ │ Signal notify contacts.json 13+ langs │ │ │ Signal poll (3s) CNAM lookup Signal │ │ │ Slash commands Lang from prefix templates │ │ │ Owner instructions Per-contact lang │ diff --git a/docs/INSTALL_PL.md b/docs/INSTALL_PL.md index fbe8d98..68206df 100644 --- a/docs/INSTALL_PL.md +++ b/docs/INSTALL_PL.md @@ -655,7 +655,7 @@ AVA posiada nastepujace mechanizmy bezpieczenstwa: │ │ │ │ │ │ │ owner_channel.py ─── contact_lookup.py ─── i18n.py │ │ │ │ │ │ │ │ -│ │ Signal powiad. contacts.json 11+ jezykow │ +│ │ Signal powiad. contacts.json 13+ jezykow │ │ │ Signal poll (3s) CNAM lookup Signal │ │ │ Slash komendy Jezyk z prefiksu szablony │ │ │ Instrukcje Per-kontakt jezyk │