diff --git a/README.md b/README.md
index 6405c27..6a785c2 100644
--- a/README.md
+++ b/README.md
@@ -26,7 +26,7 @@ graph TB
TTS["tts.py
ElevenLabs → OpenAI
→ Polly (fallback)
Cache (MD5)
Circuit breaker"]
Owner["owner_channel.py
Signal notify
Signal poll (3s)
Slash commands
Instructions"]
Contact["contact_lookup.py
contacts.json
Twilio CNAM
E.164 normalize
Lang from prefix"]
- I18n["i18n.py
11+ languages
Signal templates
Polly voices
Twilio codes"]
+ I18n["i18n.py
13+ languages
Signal templates
Polly voices
Twilio codes"]
end
SignalCLI["signal-cli :8080
REST API
Native mode
Self-hosted"]
@@ -185,7 +185,7 @@ sequenceDiagram
```mermaid
flowchart TD
- Start([CALL START]) --> Prefix["Phone prefix detection
+41 → de-CH
+48 → pl-PL
+44 → en-GB
(52 prefixes)"]
+ Start([CALL START]) --> Prefix["Phone prefix detection
+41 → de-CH
+48 → pl-PL
+44 → en-GB
(54 prefixes)"]
Prefix --> ContactCheck{Contact has
lang override?}
ContactCheck -->|Yes| ContactLang["Use contact language
contacts.json
e.g. {name: ..., lang: pl}"]
diff --git a/ROADMAP.md b/ROADMAP.md
index 1e42eef..4642d50 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -8,10 +8,13 @@
- [x] Core call flow: Twilio STT → GPT-4o → ElevenLabs TTS
- [x] Signal integration: notifications, live updates, owner instructions mid-call
-- [x] Multilingual support: 8+ languages, auto-detection, mid-call switching
+- [x] Multilingual support: 13+ languages, auto-detection, mid-call switching
- [x] Maya persona with full OWNER_CONTEXT customization
- [x] TTS provider chain with circuit breaker (ElevenLabs → OpenAI → Polly)
- [x] TTS disk cache (MD5-keyed, persistent)
+- [x] Switch to gpt-4o-mini for conversation
+- [x] Reduce max_tokens and system prompt
+- [x] Groq LLaMA 3 support
- [x] Contact lookup (local JSON + Twilio CNAM)
- [x] Streaming GPT-4o with first-sentence TTS pipelining
- [x] Twilio signature validation + rate limiting
@@ -19,7 +22,7 @@
- [x] Call recording (on/off via Signal `/recording-on`)
- [x] API usage & cost tracking per call (GPT tokens, TTS chars, estimated cost)
- [x] `/stats` with session costs and per-call cost history
-- [x] `speech_timeout` reduced to 2s (from 5s)
+- [x] `speech_timeout` reduced to 1s (from 5s)
---
@@ -31,20 +34,8 @@ _Nothing currently in progress._
## Short-term (quick wins)
-### Switch to gpt-4o-mini for conversation
-- [ ] Set `OPENAI_MODEL=gpt-4o-mini` (env var, no code changes)
-- [ ] Test quality for Polish/German/English phone conversations
-- [ ] Benchmark latency improvement (~0.5-0.8s vs ~2s for gpt-4o)
-- [ ] Keep gpt-4o for summarization only (separate model config)
-- **Impact**: ~1.2s latency reduction, 10x cheaper tokens
-
-### Reduce max_tokens and system prompt
-- [ ] Lower `max_tokens` from 350 to 150 (2-3 sentences is enough)
-- [ ] Trim system prompt (remove redundant rules, shorten examples)
-- **Impact**: ~0.3-0.5s latency reduction
-
### Pre-warm TTS cache on startup
-- [ ] Generate and cache all greetings (8 languages) at boot
+- [ ] Generate and cache all greetings (13 languages) at boot
- [ ] Cache all clarification phrases and no-input prompts
- **Impact**: eliminates 1-2s TTS delay on first call per language
@@ -52,14 +43,6 @@ _Nothing currently in progress._
## Medium-term (new capabilities)
-### Groq LLaMA 3 support
-- [ ] Add Groq as alternative LLM backend (OpenAI-compatible API)
-- [ ] Make LLM provider configurable via env var (`LLM_PROVIDER=openai|groq`)
-- [ ] Test Polish/German quality on LLaMA 3 70B/405B
-- [ ] Benchmark: expected ~0.2-0.3s inference time
-- **Impact**: fastest possible LLM response, free tier available
-- **Risk**: weaker multilingual support vs GPT
-
### Configurable speech_timeout
- [ ] Make `speech_timeout` configurable via env var
- [ ] Consider per-language tuning (some languages have longer pauses)
@@ -102,12 +85,12 @@ Current response time breakdown (user stops speaking → hears response):
| Stage | Current | With gpt-4o-mini | With Groq | With Realtime API |
|-------|---------|-------------------|-----------|-------------------|
-| speech_timeout | 2.0s | 2.0s | 2.0s | N/A |
+| speech_timeout | 1.0s | 1.0s | 1.0s | N/A |
| Twilio STT | 0.5s | 0.5s | 0.5s | N/A |
| LLM inference | 2.0s | 0.7s | 0.3s | — |
| TTS generation | 1.5s | 1.5s | 1.5s | — |
| Network/playback | 0.5s | 0.5s | 0.5s | — |
-| **Total** | **~6.5s** | **~5.2s** | **~4.8s** | **~1-2s** |
+| **Total** | **~5.5s** | **~4.2s** | **~3.8s** | **~1-2s** |
_With complementary optimizations (shorter prompt, lower max_tokens, pre-warm cache): subtract ~0.5-1s._
diff --git a/docs/INSTALL_EN.md b/docs/INSTALL_EN.md
index e746913..e82b9d7 100644
--- a/docs/INSTALL_EN.md
+++ b/docs/INSTALL_EN.md
@@ -657,7 +657,7 @@ AVA includes the following security mechanisms:
│ │ │ │ │
│ │ owner_channel.py ─── contact_lookup.py ─── i18n.py │ │
│ │ │ │ │ │
-│ │ Signal notify contacts.json 11+ langs │
+│ │ Signal notify contacts.json 13+ langs │
│ │ Signal poll (3s) CNAM lookup Signal │
│ │ Slash commands Lang from prefix templates │
│ │ Owner instructions Per-contact lang │
diff --git a/docs/INSTALL_PL.md b/docs/INSTALL_PL.md
index fbe8d98..68206df 100644
--- a/docs/INSTALL_PL.md
+++ b/docs/INSTALL_PL.md
@@ -655,7 +655,7 @@ AVA posiada nastepujace mechanizmy bezpieczenstwa:
│ │ │ │ │
│ │ owner_channel.py ─── contact_lookup.py ─── i18n.py │ │
│ │ │ │ │ │
-│ │ Signal powiad. contacts.json 11+ jezykow │
+│ │ Signal powiad. contacts.json 13+ jezykow │
│ │ Signal poll (3s) CNAM lookup Signal │
│ │ Slash komendy Jezyk z prefiksu szablony │
│ │ Instrukcje Per-kontakt jezyk │