dzaczek · google-labs-jules · Apr 10, 2026
diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ graph TB
             TTS["tts.py<br/>ElevenLabs → OpenAI<br/>→ Polly (fallback)<br/>Cache (MD5)<br/>Circuit breaker"]
             Owner["owner_channel.py<br/>Signal notify<br/>Signal poll (3s)<br/>Slash commands<br/>Instructions"]
             Contact["contact_lookup.py<br/>contacts.json<br/>Twilio CNAM<br/>E.164 normalize<br/>Lang from prefix"]
-            I18n["i18n.py<br/>11+ languages<br/>Signal templates<br/>Polly voices<br/>Twilio codes"]
+            I18n["i18n.py<br/>13+ languages<br/>Signal templates<br/>Polly voices<br/>Twilio codes"]
         end
 
         SignalCLI["signal-cli :8080<br/>REST API<br/>Native mode<br/>Self-hosted"]
@@ -185,7 +185,7 @@ sequenceDiagram
 
 ```mermaid
 flowchart TD
-    Start([CALL START]) --> Prefix["Phone prefix detection<br/>+41 → de-CH<br/>+48 → pl-PL<br/>+44 → en-GB<br/>(52 prefixes)"]
+    Start([CALL START]) --> Prefix["Phone prefix detection<br/>+41 → de-CH<br/>+48 → pl-PL<br/>+44 → en-GB<br/>(54 prefixes)"]
 
     Prefix --> ContactCheck{Contact has<br/>lang override?}
     ContactCheck -->|Yes| ContactLang["Use contact language<br/>contacts.json<br/>e.g. {name: ..., lang: pl}"]

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -8,18 +8,21 @@
 
 - [x] Core call flow: Twilio STT → GPT-4o → ElevenLabs TTS
 - [x] Signal integration: notifications, live updates, owner instructions mid-call
-- [x] Multilingual support: 8+ languages, auto-detection, mid-call switching
+- [x] Multilingual support: 13+ languages, auto-detection, mid-call switching
 - [x] Maya persona with full OWNER_CONTEXT customization
 - [x] TTS provider chain with circuit breaker (ElevenLabs → OpenAI → Polly)
 - [x] TTS disk cache (MD5-keyed, persistent)
+- [x] Switch to gpt-4o-mini for conversation
+- [x] Reduce max_tokens and system prompt
+- [x] Groq LLaMA 3 support
 - [x] Contact lookup (local JSON + Twilio CNAM)
 - [x] Streaming GPT-4o with first-sentence TTS pipelining
 - [x] Twilio signature validation + rate limiting
 - [x] Cloudflare Tunnel as alternative to Caddy for HTTPS ingress
 - [x] Call recording (on/off via Signal `/recording-on`)
 - [x] API usage & cost tracking per call (GPT tokens, TTS chars, estimated cost)
 - [x] `/stats` with session costs and per-call cost history
-- [x] `speech_timeout` reduced to 2s (from 5s)
+- [x] `speech_timeout` reduced to 1s (from 5s)
 
 ---
 
@@ -31,35 +34,15 @@ _Nothing currently in progress._
 
 ## Short-term (quick wins)
 
-### Switch to gpt-4o-mini for conversation
-- [ ] Set `OPENAI_MODEL=gpt-4o-mini` (env var, no code changes)
-- [ ] Test quality for Polish/German/English phone conversations
-- [ ] Benchmark latency improvement (~0.5-0.8s vs ~2s for gpt-4o)
-- [ ] Keep gpt-4o for summarization only (separate model config)
-- **Impact**: ~1.2s latency reduction, 10x cheaper tokens
-
-### Reduce max_tokens and system prompt
-- [ ] Lower `max_tokens` from 350 to 150 (2-3 sentences is enough)
-- [ ] Trim system prompt (remove redundant rules, shorten examples)
-- **Impact**: ~0.3-0.5s latency reduction
-
 ### Pre-warm TTS cache on startup
-- [ ] Generate and cache all greetings (8 languages) at boot
+- [ ] Generate and cache all greetings (13 languages) at boot
 - [ ] Cache all clarification phrases and no-input prompts
 - **Impact**: eliminates 1-2s TTS delay on first call per language
 
 ---
 
 ## Medium-term (new capabilities)
 
-### Groq LLaMA 3 support
-- [ ] Add Groq as alternative LLM backend (OpenAI-compatible API)
-- [ ] Make LLM provider configurable via env var (`LLM_PROVIDER=openai|groq`)
-- [ ] Test Polish/German quality on LLaMA 3 70B/405B
-- [ ] Benchmark: expected ~0.2-0.3s inference time
-- **Impact**: fastest possible LLM response, free tier available
-- **Risk**: weaker multilingual support vs GPT
-
 ### Configurable speech_timeout
 - [ ] Make `speech_timeout` configurable via env var
 - [ ] Consider per-language tuning (some languages have longer pauses)
@@ -102,12 +85,12 @@ Current response time breakdown (user stops speaking → hears response):
 
 | Stage | Current | With gpt-4o-mini | With Groq | With Realtime API |
 |-------|---------|-------------------|-----------|-------------------|
-| speech_timeout | 2.0s | 2.0s | 2.0s | N/A |
+| speech_timeout | 1.0s | 1.0s | 1.0s | N/A |
 | Twilio STT | 0.5s | 0.5s | 0.5s | N/A |
 | LLM inference | 2.0s | 0.7s | 0.3s | — |
 | TTS generation | 1.5s | 1.5s | 1.5s | — |
 | Network/playback | 0.5s | 0.5s | 0.5s | — |
-| **Total** | **~6.5s** | **~5.2s** | **~4.8s** | **~1-2s** |
+| **Total** | **~5.5s** | **~4.2s** | **~3.8s** | **~1-2s** |
 
 _With complementary optimizations (shorter prompt, lower max_tokens, pre-warm cache): subtract ~0.5-1s._
 

diff --git a/docs/INSTALL_EN.md b/docs/INSTALL_EN.md
@@ -657,7 +657,7 @@ AVA includes the following security mechanisms:
 │  │     │                                                  │       │
 │  │  owner_channel.py ─── contact_lookup.py ─── i18n.py   │       │
 │  │     │                      │                           │       │
-│  │  Signal notify          contacts.json             11+ langs   │
+│  │  Signal notify          contacts.json             13+ langs   │
 │  │  Signal poll (3s)       CNAM lookup                Signal     │
 │  │  Slash commands         Lang from prefix           templates  │
 │  │  Owner instructions     Per-contact lang                      │

diff --git a/docs/INSTALL_PL.md b/docs/INSTALL_PL.md
@@ -655,7 +655,7 @@ AVA posiada nastepujace mechanizmy bezpieczenstwa:
 │  │     │                                                  │       │
 │  │  owner_channel.py ─── contact_lookup.py ─── i18n.py   │       │
 │  │     │                      │                           │       │
-│  │  Signal powiad.         contacts.json             11+ jezykow │
+│  │  Signal powiad.         contacts.json             13+ jezykow │
 │  │  Signal poll (3s)       CNAM lookup                Signal     │
 │  │  Slash komendy          Jezyk z prefiksu           szablony   │
 │  │  Instrukcje             Per-kontakt jezyk                     │