Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ graph TB
TTS["tts.py<br/>ElevenLabs → OpenAI<br/>→ Polly (fallback)<br/>Cache (MD5)<br/>Circuit breaker"]
Owner["owner_channel.py<br/>Signal notify<br/>Signal poll (3s)<br/>Slash commands<br/>Instructions"]
Contact["contact_lookup.py<br/>contacts.json<br/>Twilio CNAM<br/>E.164 normalize<br/>Lang from prefix"]
I18n["i18n.py<br/>11+ languages<br/>Signal templates<br/>Polly voices<br/>Twilio codes"]
I18n["i18n.py<br/>13+ languages<br/>Signal templates<br/>Polly voices<br/>Twilio codes"]
end

SignalCLI["signal-cli :8080<br/>REST API<br/>Native mode<br/>Self-hosted"]
Expand Down Expand Up @@ -185,7 +185,7 @@ sequenceDiagram

```mermaid
flowchart TD
Start([CALL START]) --> Prefix["Phone prefix detection<br/>+41 → de-CH<br/>+48 → pl-PL<br/>+44 → en-GB<br/>(52 prefixes)"]
Start([CALL START]) --> Prefix["Phone prefix detection<br/>+41 → de-CH<br/>+48 → pl-PL<br/>+44 → en-GB<br/>(54 prefixes)"]

Prefix --> ContactCheck{Contact has<br/>lang override?}
ContactCheck -->|Yes| ContactLang["Use contact language<br/>contacts.json<br/>e.g. {name: ..., lang: pl}"]
Expand Down
33 changes: 8 additions & 25 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,21 @@

- [x] Core call flow: Twilio STT → GPT-4o → ElevenLabs TTS
- [x] Signal integration: notifications, live updates, owner instructions mid-call
- [x] Multilingual support: 8+ languages, auto-detection, mid-call switching
- [x] Multilingual support: 13+ languages, auto-detection, mid-call switching
- [x] Maya persona with full OWNER_CONTEXT customization
- [x] TTS provider chain with circuit breaker (ElevenLabs → OpenAI → Polly)
- [x] TTS disk cache (MD5-keyed, persistent)
- [x] Switch to gpt-4o-mini for conversation
- [x] Reduce max_tokens and system prompt
- [x] Groq LLaMA 3 support
- [x] Contact lookup (local JSON + Twilio CNAM)
- [x] Streaming GPT-4o with first-sentence TTS pipelining
- [x] Twilio signature validation + rate limiting
- [x] Cloudflare Tunnel as alternative to Caddy for HTTPS ingress
- [x] Call recording (on/off via Signal `/recording-on`)
- [x] API usage & cost tracking per call (GPT tokens, TTS chars, estimated cost)
- [x] `/stats` with session costs and per-call cost history
- [x] `speech_timeout` reduced to 2s (from 5s)
- [x] `speech_timeout` reduced to 1s (from 5s)

---

Expand All @@ -31,35 +34,15 @@ _Nothing currently in progress._

## Short-term (quick wins)

### Switch to gpt-4o-mini for conversation
- [ ] Set `OPENAI_MODEL=gpt-4o-mini` (env var, no code changes)
- [ ] Test quality for Polish/German/English phone conversations
- [ ] Benchmark latency improvement (~0.5-0.8s vs ~2s for gpt-4o)
- [ ] Keep gpt-4o for summarization only (separate model config)
- **Impact**: ~1.2s latency reduction, 10x cheaper tokens

### Reduce max_tokens and system prompt
- [ ] Lower `max_tokens` from 350 to 150 (2-3 sentences is enough)
- [ ] Trim system prompt (remove redundant rules, shorten examples)
- **Impact**: ~0.3-0.5s latency reduction

### Pre-warm TTS cache on startup
- [ ] Generate and cache all greetings (8 languages) at boot
- [ ] Generate and cache all greetings (13 languages) at boot
- [ ] Cache all clarification phrases and no-input prompts
- **Impact**: eliminates 1-2s TTS delay on first call per language

---

## Medium-term (new capabilities)

### Groq LLaMA 3 support
- [ ] Add Groq as alternative LLM backend (OpenAI-compatible API)
- [ ] Make LLM provider configurable via env var (`LLM_PROVIDER=openai|groq`)
- [ ] Test Polish/German quality on LLaMA 3 70B/405B
- [ ] Benchmark: expected ~0.2-0.3s inference time
- **Impact**: fastest possible LLM response, free tier available
- **Risk**: weaker multilingual support vs GPT

### Configurable speech_timeout
- [ ] Make `speech_timeout` configurable via env var
- [ ] Consider per-language tuning (some languages have longer pauses)
Expand Down Expand Up @@ -102,12 +85,12 @@ Current response time breakdown (user stops speaking → hears response):

| Stage | Current | With gpt-4o-mini | With Groq | With Realtime API |
|-------|---------|-------------------|-----------|-------------------|
| speech_timeout | 2.0s | 2.0s | 2.0s | N/A |
| speech_timeout | 1.0s | 1.0s | 1.0s | N/A |
| Twilio STT | 0.5s | 0.5s | 0.5s | N/A |
| LLM inference | 2.0s | 0.7s | 0.3s | — |
| TTS generation | 1.5s | 1.5s | 1.5s | — |
| Network/playback | 0.5s | 0.5s | 0.5s | — |
| **Total** | **~6.5s** | **~5.2s** | **~4.8s** | **~1-2s** |
| **Total** | **~5.5s** | **~4.2s** | **~3.8s** | **~1-2s** |

_With complementary optimizations (shorter prompt, lower max_tokens, pre-warm cache): subtract ~0.5-1s._

Expand Down
2 changes: 1 addition & 1 deletion docs/INSTALL_EN.md
Original file line number Diff line number Diff line change
Expand Up @@ -657,7 +657,7 @@ AVA includes the following security mechanisms:
│ │ │ │ │
│ │ owner_channel.py ─── contact_lookup.py ─── i18n.py │ │
│ │ │ │ │ │
│ │ Signal notify contacts.json 11+ langs │
│ │ Signal notify contacts.json 13+ langs │
│ │ Signal poll (3s) CNAM lookup Signal │
│ │ Slash commands Lang from prefix templates │
│ │ Owner instructions Per-contact lang │
Expand Down
2 changes: 1 addition & 1 deletion docs/INSTALL_PL.md
Original file line number Diff line number Diff line change
Expand Up @@ -655,7 +655,7 @@ AVA posiada nastepujace mechanizmy bezpieczenstwa:
│ │ │ │ │
│ │ owner_channel.py ─── contact_lookup.py ─── i18n.py │ │
│ │ │ │ │ │
│ │ Signal powiad. contacts.json 11+ jezykow │
│ │ Signal powiad. contacts.json 13+ jezykow │
│ │ Signal poll (3s) CNAM lookup Signal │
│ │ Slash komendy Jezyk z prefiksu szablony │
│ │ Instrukcje Per-kontakt jezyk │
Expand Down