Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
0d5b822
feat(voice): pluggable voice backend with Gemini Live API support
Overbaker Apr 4, 2026
09520c2
feat(voice): switch default language to Chinese
Overbaker Apr 4, 2026
99a60e5
fix(voice): use inline Blob URL for AudioWorklet instead of ?url import
Overbaker Apr 4, 2026
4749454
fix(voice): switch to gemini-2.5-flash-native-audio-latest model
Overbaker Apr 4, 2026
964ea2e
feat(voice): add Qwen Realtime as third voice backend option
Overbaker Apr 4, 2026
a297748
fix(pwa): add skipWaiting + clientsClaim for immediate SW activation
Overbaker Apr 4, 2026
7a57ba3
fix(voice): Qwen WebSocket proxy + switch to qwen3-omni-flash-realtime
Overbaker Apr 4, 2026
8f85268
fix(voice): switch default TTS to qwen-realtime + increase socket buffer
Overbaker Apr 5, 2026
0cfe63e
fix(voice): fix Gemini Live barge-in and tool call issues
Overbaker Apr 5, 2026
bde20fa
fix(voice): greeting via system prompt to preserve tool call ability
Overbaker Apr 5, 2026
f5cbd0e
fix(voice): address PR review — add JWT auth to WS proxy, stop leakin…
Overbaker Apr 21, 2026
e32c1f6
fix(voice): gate voice button on backend discovery readiness
Overbaker Apr 21, 2026
fbf315b
fix(voice): fix Qwen URL duplication, defer ready until session regis…
Overbaker Apr 21, 2026
c68366a
fix(voice): use request origin for proxy URL, connect worklet to sink…
Overbaker Apr 21, 2026
759bf35
fix(voice): create playback AudioContext in user gesture for mobile a…
Overbaker Apr 21, 2026
74aa4c2
fix(voice): store playback AudioContext in state and close on cleanup
Overbaker Apr 21, 2026
296dc85
fix(pwa,voice): remove forced SW activation, guard voice debug logs
Overbaker Apr 21, 2026
f108c0f
fix(voice): reject startup promise on early WS close, fix debug flag
Overbaker Apr 21, 2026
7111b67
fix(voice): queue Gemini proxy messages during connect, fix Qwen star…
Overbaker Apr 22, 2026
5c60488
fix(voice): pin browser wsUrl to proxy, separate ElevenLabs language …
Overbaker Apr 22, 2026
aa9802d
fix(voice): apply initial mic mute state after recorder starts
Overbaker Apr 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions .claude/plan/hapi-web-loading-fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# 📋 实施计划:Hapi Web 加载失败 + 语音后端修复

## 诊断结论

### 根因分析

| 问题 | 根因 | 严重性 |
|------|------|--------|
| Web 版本加载不了 | Hub 进程未重启,运行的是旧环境变量 + Service Worker 缓存旧资源 | Critical |
| 更改语音选项后出问题 | `~/.hapi/env` 修改后 Hub 不会热加载,需要重启 | Critical |
| 数据库是否分了版本 | **只有一个数据库** `~/.hapi/hapi.db`,无 dev/prod 分离,排除此问题 | ✅ 已排除 |

### 关键证据

1. **Hub 进程**: PID 44317, 启动于 **4/3 16:09**
2. **env 文件**: 最后修改于 **4/5 06:13** (Hub 启动后 2 天)
3. **环境变量不同步**:
- `~/.hapi/env` 中 `VOICE_BACKEND=gemini-live`
- 运行中 Hub 实际返回 `{"backend":"qwen-realtime"}`(因为 Hub 进程的 process.env 中没有 `VOICE_BACKEND`,回退到 `DEFAULT_VOICE_BACKEND = 'qwen-realtime'`)
4. **Web 静态文件**: 所有资源返回 200,HTML/JS/CSS 正常可达
5. **数据库**: 单一 SQLite `~/.hapi/hapi.db`,schema v6,WAL 模式正常

### 用户需求更新

用户明确表示 **想用 Gemini TTS**,需要将 `VOICE_BACKEND` 设为 `gemini-live`。

---

## 任务类型
- [x] 后端 (→ Hub 重启 + env 修复)
- [x] 前端 (→ Service Worker 清理 + 确认 Gemini Live 组件正常)

## 技术方案

**核心修复**: 重启 Hub 进程使其加载最新的 `~/.hapi/env` 环境变量。

**辅助修复**: 清理 `web/dist` 中的旧构建产物,确保 Service Worker 不缓存过期资源。

---

## 实施步骤

### Step 1: 确认并修复 env 配置
- 文件: `/home/ubuntu/.hapi/env`
- 确保 `VOICE_BACKEND=gemini-live`(用户要用 Gemini TTS)
- 确保 `GEMINI_API_KEY` 已配置
- 预期产物: env 文件就绪

### Step 2: 清理 web 构建产物
- 删除 `/home/ubuntu/hapi/web/dist/` 并重新构建
- 命令: `cd /home/ubuntu/hapi/web && rm -rf dist && bun run build`
- 预期产物: 干净的 `web/dist/` 目录

### Step 3: 重启 Hub 进程
- 停止当前 Hub (PID 44317)
- 重新启动 Hub,使其读取最新 env
- 命令: `hapi runner restart` 或手动 kill + 启动
- 预期产物: Hub 进程以新 env 运行

### Step 4: 验证修复
- 调用 `GET /api/voice/backend` 确认返回 `gemini-live`
- 访问 `https://ccg.aimo3d.org/` 确认页面加载正常
- 测试 Gemini Live 语音功能
- 预期产物: Web 正常加载 + 语音后端为 Gemini

### Step 5: (可选) Service Worker 客户端清理
- 如果用户浏览器仍显示旧内容,需要:
- 清除浏览器 Service Worker 缓存
- 或强制刷新 (Ctrl+Shift+R)
- `sw.ts` 已有 `skipWaiting + clientsClaim`,重建后应自动更新

---

## 关键文件

| 文件 | 操作 | 说明 |
|------|------|------|
| `~/.hapi/env` | 确认 | VOICE_BACKEND=gemini-live |
| `web/dist/` | 重建 | 清理旧构建产物 |
| Hub 进程 (PID 44317) | 重启 | 加载最新 env |
| `shared/src/voice.ts:272` | 无需修改 | DEFAULT_VOICE_BACKEND 仅作 fallback |
| `hub/src/web/routes/voice.ts:122-128` | 无需修改 | 逻辑正确,只需 env 生效 |
| `~/.hapi/hapi.db` | 无操作 | 唯一数据库,无需修改 |

## 风险与缓解

| 风险 | 缓解措施 |
|------|----------|
| 重启 Hub 会中断活跃 Claude 会话 | 会话可通过 `--resume` 恢复 |
| Gemini API Key 可能无效/过期 | Step 4 验证 token 端点 |
| 浏览器 SW 缓存未更新 | skipWaiting 机制 + 手动清除指引 |

## SESSION_ID(供 /ccg:execute 使用)
- CODEX_SESSION: N/A(诊断任务,未调用)
- GEMINI_SESSION: N/A(诊断任务,未调用)
394 changes: 394 additions & 0 deletions .claude/team-plan/pluggable-voice-backend.md

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions hub/src/socket/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ export function createSocketServer(deps: SocketServerDeps): {
const engine = new Engine({
path: '/socket.io/',
cors: corsOptions,
maxHttpBufferSize: 55 * 1024 * 1024, // 55MB to match upload limit
allowRequest: async (req) => {
const origin = req.headers.get('origin')
if (!origin || allowAllOrigins || corsOrigins.includes(origin)) {
Expand Down
152 changes: 152 additions & 0 deletions hub/src/web/routes/voice.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
import { describe, test, expect, afterEach } from 'bun:test'
import { Hono } from 'hono'
import type { WebAppEnv } from '../middleware/auth'
import { createVoiceRoutes } from './voice'

function createApp() {
const app = new Hono<WebAppEnv>()
app.route('/api', createVoiceRoutes())
return app
}

describe('GET /api/voice/backend', () => {
const originalEnv = process.env.VOICE_BACKEND

afterEach(() => {
if (originalEnv === undefined) {
delete process.env.VOICE_BACKEND
} else {
process.env.VOICE_BACKEND = originalEnv
}
})

test('returns elevenlabs by default', async () => {
delete process.env.VOICE_BACKEND
const app = createApp()
const res = await app.request('/api/voice/backend')
expect(res.status).toBe(200)
const body = await res.json() as { backend: string }
expect(body.backend).toBe('elevenlabs')
})

test('returns gemini-live when configured', async () => {
process.env.VOICE_BACKEND = 'gemini-live'
const app = createApp()
const res = await app.request('/api/voice/backend')
expect(res.status).toBe(200)
const body = await res.json() as { backend: string }
expect(body.backend).toBe('gemini-live')
})

test('returns qwen-realtime when configured', async () => {
process.env.VOICE_BACKEND = 'qwen-realtime'
const app = createApp()
const res = await app.request('/api/voice/backend')
expect(res.status).toBe(200)
const body = await res.json() as { backend: string }
expect(body.backend).toBe('qwen-realtime')
})

test('falls back to elevenlabs for unknown values', async () => {
process.env.VOICE_BACKEND = 'unknown-backend'
const app = createApp()
const res = await app.request('/api/voice/backend')
expect(res.status).toBe(200)
const body = await res.json() as { backend: string }
expect(body.backend).toBe('elevenlabs')
})
})

describe('POST /api/voice/gemini-token', () => {
const origGemini = process.env.GEMINI_API_KEY
const origGoogle = process.env.GOOGLE_API_KEY

afterEach(() => {
if (origGemini === undefined) delete process.env.GEMINI_API_KEY
else process.env.GEMINI_API_KEY = origGemini
if (origGoogle === undefined) delete process.env.GOOGLE_API_KEY
else process.env.GOOGLE_API_KEY = origGoogle
})

test('returns 400 when no API key configured', async () => {
delete process.env.GEMINI_API_KEY
delete process.env.GOOGLE_API_KEY
const app = createApp()
const res = await app.request('/api/voice/gemini-token', { method: 'POST' })
expect(res.status).toBe(400)
const body = await res.json() as { allowed: boolean; error: string }
expect(body.allowed).toBe(false)
expect(body.error).toContain('not configured')
})

test('returns proxied wsUrl when GEMINI_API_KEY is set', async () => {
process.env.GEMINI_API_KEY = 'test-gemini-key'
delete process.env.GOOGLE_API_KEY
const app = createApp()
const res = await app.request('/api/voice/gemini-token', { method: 'POST' })
expect(res.status).toBe(200)
const body = await res.json() as { allowed: boolean; apiKey: string; wsUrl: string }
expect(body.allowed).toBe(true)
expect(body.apiKey).toBe('proxied')
expect(body.wsUrl).toContain('/api/voice/gemini-ws')
})

test('falls back to GOOGLE_API_KEY', async () => {
delete process.env.GEMINI_API_KEY
process.env.GOOGLE_API_KEY = 'test-google-key'
const app = createApp()
const res = await app.request('/api/voice/gemini-token', { method: 'POST' })
expect(res.status).toBe(200)
const body = await res.json() as { allowed: boolean; apiKey: string; wsUrl: string }
expect(body.allowed).toBe(true)
expect(body.apiKey).toBe('proxied')
expect(body.wsUrl).toContain('/api/voice/gemini-ws')
})
})

describe('POST /api/voice/qwen-token', () => {
const origDash = process.env.DASHSCOPE_API_KEY
const origQwen = process.env.QWEN_API_KEY

afterEach(() => {
if (origDash === undefined) delete process.env.DASHSCOPE_API_KEY
else process.env.DASHSCOPE_API_KEY = origDash
if (origQwen === undefined) delete process.env.QWEN_API_KEY
else process.env.QWEN_API_KEY = origQwen
})

test('returns 400 when no API key configured', async () => {
delete process.env.DASHSCOPE_API_KEY
delete process.env.QWEN_API_KEY
const app = createApp()
const res = await app.request('/api/voice/qwen-token', { method: 'POST' })
expect(res.status).toBe(400)
const body = await res.json() as { allowed: boolean; error: string }
expect(body.allowed).toBe(false)
expect(body.error).toContain('not configured')
})

test('returns wsUrl when DASHSCOPE_API_KEY is set (no raw key exposed)', async () => {
process.env.DASHSCOPE_API_KEY = 'test-dash-key'
delete process.env.QWEN_API_KEY
const app = createApp()
const res = await app.request('/api/voice/qwen-token', { method: 'POST' })
expect(res.status).toBe(200)
const body = await res.json() as { allowed: boolean; wsUrl: string }
expect(body.allowed).toBe(true)
expect(body.wsUrl).toContain('/api/voice/qwen-ws')
expect(body).not.toHaveProperty('apiKey')
})

test('falls back to QWEN_API_KEY', async () => {
delete process.env.DASHSCOPE_API_KEY
process.env.QWEN_API_KEY = 'test-qwen-key'
const app = createApp()
const res = await app.request('/api/voice/qwen-token', { method: 'POST' })
expect(res.status).toBe(200)
const body = await res.json() as { allowed: boolean; wsUrl: string }
expect(body.allowed).toBe(true)
expect(body.wsUrl).toContain('/api/voice/qwen-ws')
expect(body).not.toHaveProperty('apiKey')
})
})
63 changes: 62 additions & 1 deletion hub/src/web/routes/voice.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ import type { WebAppEnv } from '../middleware/auth'
import {
ELEVENLABS_API_BASE,
VOICE_AGENT_NAME,
buildVoiceAgentConfig
buildVoiceAgentConfig,
DEFAULT_VOICE_BACKEND
} from '@hapi/protocol/voice'
import type { VoiceBackendType } from '@hapi/protocol/voice'

const tokenRequestSchema = z.object({
customAgentId: z.string().optional(),
Expand Down Expand Up @@ -116,6 +118,65 @@ async function getOrCreateAgentId(apiKey: string): Promise<string | null> {
export function createVoiceRoutes(): Hono<WebAppEnv> {
const app = new Hono<WebAppEnv>()

// Return the configured voice backend type
app.get('/voice/backend', (c) => {
const raw = process.env.VOICE_BACKEND
const backend: VoiceBackendType =
raw === 'gemini-live' ? 'gemini-live'
: raw === 'qwen-realtime' ? 'qwen-realtime'
: DEFAULT_VOICE_BACKEND
return c.json({ backend })
})

// Get Gemini API key for Gemini Live voice sessions
// Gemini Live API does not support ephemeral tokens, so we proxy the key.
// The key is short-lived in the browser session and never persisted client-side.
app.post('/voice/gemini-token', async (c) => {
const apiKey = process.env.GEMINI_API_KEY || process.env.GOOGLE_API_KEY
if (!apiKey) {
return c.json({
allowed: false,
error: 'Gemini API key not configured (set GEMINI_API_KEY or GOOGLE_API_KEY)'
}, 400)
}

// Use server-side WS proxy to avoid region restrictions.
// The proxy at /api/voice/gemini-ws handles the API key server-side.
// Derive wsUrl from the request origin so remote browsers connect back to the hub,
// not to localhost. HAPI_PUBLIC_URL overrides when set (e.g. behind a reverse proxy).
const requestOrigin = new URL(c.req.url).origin
const publicUrl = process.env.HAPI_PUBLIC_URL || requestOrigin
const wsProxyUrl = publicUrl.replace(/^http/, 'ws') + '/api/voice/gemini-ws'

return c.json({
allowed: true,
apiKey: 'proxied', // Dummy — key is handled server-side
wsUrl: wsProxyUrl, // Always proxy — env WS URLs are upstream-only (server-side)
baseUrl: process.env.GEMINI_API_BASE || undefined
})
})

// Check Qwen (DashScope) availability for Qwen Realtime voice sessions
// The actual API key is never sent to the browser — it stays server-side in the WS proxy.
app.post('/voice/qwen-token', async (c) => {
const apiKey = process.env.DASHSCOPE_API_KEY || process.env.QWEN_API_KEY
if (!apiKey) {
return c.json({
allowed: false,
error: 'DashScope API key not configured (set DASHSCOPE_API_KEY or QWEN_API_KEY)'
}, 400)
}

const requestOrigin = new URL(c.req.url).origin
const publicUrl = process.env.HAPI_PUBLIC_URL || requestOrigin
const wsProxyUrl = publicUrl.replace(/^http/, 'ws') + '/api/voice/qwen-ws'

return c.json({
allowed: true,
wsUrl: wsProxyUrl // Always proxy — env WS URLs are upstream-only (server-side)
})
})

// Get ElevenLabs ConvAI conversation token
app.post('/voice/token', async (c) => {
const json = await c.req.json().catch(() => null)
Expand Down
Loading
Loading