diff --git a/.claude/plan/hapi-web-loading-fix.md b/.claude/plan/hapi-web-loading-fix.md
new file mode 100644
index 0000000000..9cf5da8939
--- /dev/null
+++ b/.claude/plan/hapi-web-loading-fix.md
@@ -0,0 +1,95 @@
+# 📋 实施计划：Hapi Web 加载失败 + 语音后端修复
+
+## 诊断结论
+
+### 根因分析
+
+| 问题 | 根因 | 严重性 |
+|------|------|--------|
+| Web 版本加载不了 | Hub 进程未重启，运行的是旧环境变量 + Service Worker 缓存旧资源 | Critical |
+| 更改语音选项后出问题 | `~/.hapi/env` 修改后 Hub 不会热加载，需要重启 | Critical |
+| 数据库是否分了版本 | **只有一个数据库** `~/.hapi/hapi.db`，无 dev/prod 分离，排除此问题 | ✅ 已排除 |
+
+### 关键证据
+
+1. **Hub 进程**: PID 44317, 启动于 **4/3 16:09**
+2. **env 文件**: 最后修改于 **4/5 06:13** (Hub 启动后 2 天)
+3. **环境变量不同步**:
+   - `~/.hapi/env` 中 `VOICE_BACKEND=gemini-live`
+   - 运行中 Hub 实际返回 `{"backend":"qwen-realtime"}`（因为 Hub 进程的 process.env 中没有 `VOICE_BACKEND`，回退到 `DEFAULT_VOICE_BACKEND = 'qwen-realtime'`）
+4. **Web 静态文件**: 所有资源返回 200，HTML/JS/CSS 正常可达
+5. **数据库**: 单一 SQLite `~/.hapi/hapi.db`，schema v6，WAL 模式正常
+
+### 用户需求更新
+
+用户明确表示 **想用 Gemini TTS**，需要将 `VOICE_BACKEND` 设为 `gemini-live`。
+
+---
+
+## 任务类型
+- [x] 后端 (→ Hub 重启 + env 修复)
+- [x] 前端 (→ Service Worker 清理 + 确认 Gemini Live 组件正常)
+
+## 技术方案
+
+**核心修复**: 重启 Hub 进程使其加载最新的 `~/.hapi/env` 环境变量。
+
+**辅助修复**: 清理 `web/dist` 中的旧构建产物，确保 Service Worker 不缓存过期资源。
+
+---
+
+## 实施步骤
+
+### Step 1: 确认并修复 env 配置
+- 文件: `/home/ubuntu/.hapi/env`
+- 确保 `VOICE_BACKEND=gemini-live`（用户要用 Gemini TTS）
+- 确保 `GEMINI_API_KEY` 已配置
+- 预期产物: env 文件就绪
+
+### Step 2: 清理 web 构建产物
+- 删除 `/home/ubuntu/hapi/web/dist/` 并重新构建
+- 命令: `cd /home/ubuntu/hapi/web && rm -rf dist && bun run build`
+- 预期产物: 干净的 `web/dist/` 目录
+
+### Step 3: 重启 Hub 进程
+- 停止当前 Hub (PID 44317)
+- 重新启动 Hub，使其读取最新 env
+- 命令: `hapi runner restart` 或手动 kill + 启动
+- 预期产物: Hub 进程以新 env 运行
+
+### Step 4: 验证修复
+- 调用 `GET /api/voice/backend` 确认返回 `gemini-live`
+- 访问 `https://ccg.aimo3d.org/` 确认页面加载正常
+- 测试 Gemini Live 语音功能
+- 预期产物: Web 正常加载 + 语音后端为 Gemini
+
+### Step 5: (可选) Service Worker 客户端清理
+- 如果用户浏览器仍显示旧内容，需要：
+  - 清除浏览器 Service Worker 缓存
+  - 或强制刷新 (Ctrl+Shift+R)
+- `sw.ts` 已有 `skipWaiting + clientsClaim`，重建后应自动更新
+
+---
+
+## 关键文件
+
+| 文件 | 操作 | 说明 |
+|------|------|------|
+| `~/.hapi/env` | 确认 | VOICE_BACKEND=gemini-live |
+| `web/dist/` | 重建 | 清理旧构建产物 |
+| Hub 进程 (PID 44317) | 重启 | 加载最新 env |
+| `shared/src/voice.ts:272` | 无需修改 | DEFAULT_VOICE_BACKEND 仅作 fallback |
+| `hub/src/web/routes/voice.ts:122-128` | 无需修改 | 逻辑正确，只需 env 生效 |
+| `~/.hapi/hapi.db` | 无操作 | 唯一数据库，无需修改 |
+
+## 风险与缓解
+
+| 风险 | 缓解措施 |
+|------|----------|
+| 重启 Hub 会中断活跃 Claude 会话 | 会话可通过 `--resume` 恢复 |
+| Gemini API Key 可能无效/过期 | Step 4 验证 token 端点 |
+| 浏览器 SW 缓存未更新 | skipWaiting 机制 + 手动清除指引 |
+
+## SESSION_ID（供 /ccg:execute 使用）
+- CODEX_SESSION: N/A（诊断任务，未调用）
+- GEMINI_SESSION: N/A（诊断任务，未调用）
diff --git a/.claude/team-plan/pluggable-voice-backend.md b/.claude/team-plan/pluggable-voice-backend.md
new file mode 100644
index 0000000000..52e203ab83
--- /dev/null
+++ b/.claude/team-plan/pluggable-voice-backend.md
@@ -0,0 +1,394 @@
+# Team Plan: Pluggable Voice Backend (ElevenLabs + Gemini Live)
+
+## Overview
+
+Refactor Hapi voice assistant into a pluggable architecture (Strategy Pattern) supporting ElevenLabs ConvAI (default) and Gemini Live API backends, switchable via `VOICE_BACKEND` env var. Minimize upstream file changes to reduce git pull conflicts.
+
+## Codex Analysis Summary
+
+- Existing `VoiceSession` interface + `registerVoiceSession()` is already a Strategy injection point
+- `VOICE_BACKEND` should be resolved at runtime via hub API (not Vite env), since web frontend has no runtime env mechanism
+- `sendContextualUpdate` has no Gemini Live equivalent; must approximate via `send_realtime_input` for incremental updates, `send_client_content` for initial context seeding
+- Ephemeral tokens use `v1alpha` endpoint; regular API key uses `v1beta` — hub must handle this divergence
+- Tool calling in Gemini Live requires synchronous `sendToolResponse`; existing `processPermissionRequest` involves async network calls — keep responses short
+- Hidden coupling: `VoiceSessionConfig.language` is typed as `ElevenLabsLanguage` (types.ts:1)
+- Settings page language list is ElevenLabs-specific (functions named `getElevenLabsSupportedLanguages`)
+
+## Gemini Analysis Summary
+
+- Proposed transparent proxy component pattern: `RealtimeVoiceSession` becomes a switcher
+- Audio pipeline: capture via `getUserMedia` + `AudioWorkletNode` for 16kHz downsampling → PCM16 → base64 → WebSocket; playback via `AudioContext(24000)` with scheduled buffer queue
+- Tool adapter needed: `getFunctionDeclarations()` maps existing client tools to Gemini format, `handleToolCall()` bridges execution
+- Client VAD + server VAD hybrid for barge-in: clear playback queue immediately on interruption
+- Settings page needs conditional rendering based on active backend
+- No changes needed to `SessionChat.tsx`, `ComposerButtons.tsx`, `HappyThread.tsx` — they only consume abstract `useVoice()` status
+
+## Functional Review Findings (v2)
+
+### Critical Gaps Fixed
+- C1: AudioWorklet processor file was missing → added to Task 4
+- C2: Token expiry/reconnect not handled → added to Task 2 + Task 6
+- C3: Session switching routes tool calls to wrong session → auto-stop on session switch
+- C4: Voice component lifecycle / unmount cleanup → unified dispose() path in Task 6
+
+### High Gaps Fixed
+- H1: Mobile AudioContext blocked by autoplay → AudioContext created in user gesture handler
+- H2: GEMINI_API_KEY missing behavior undefined → explicit error contract
+- H3: Tool calling multi-call/timeout → serial execution + per-call timeout
+- H4: React Strict Mode double-mount → useEffect cleanup
+- H5: Voice button available before backend loads → voiceReady state gating
+
+### Medium Gaps (addressed in Task 8/9)
+- M1: Bundle size → React.lazy() dynamic import
+- M2: Settings page not adapted → conditional rendering
+- M3: No tests → added Task 8
+- M4: No docs update → added Task 9
+
+## Technical Decisions
+
+| Decision | Choice | Rationale |
+|----------|--------|-----------|
+| Backend discovery | Hub runtime API (`GET /voice/backend`) | Web has no runtime env; avoids Vite rebuild to switch |
+| Wrapper location | New `VoiceBackendSession.tsx` | Original `RealtimeVoiceSession.tsx` untouched = zero upstream conflict |
+| Audio processing | Separate `gemini/` subdirectory | Isolate complexity; testable independently |
+| Tool bridge | Adapter in `gemini/toolAdapter.ts` | Reuse existing `realtimeClientTools` without modification |
+| Language type | Keep `ElevenLabsLanguage` for now | Gemini ignores language pref initially; refactor later to avoid upstream diff |
+| Token flow | Hub creates ephemeral token for both backends | Never expose long-lived API keys to browser |
+| Session switch | Auto-stop voice on session change | Prevents tool calls routing to wrong session |
+| Gemini code loading | React.lazy() dynamic import | Zero bundle impact when using ElevenLabs |
+| AudioContext creation | Synchronous in user gesture handler | Required for iOS/Android autoplay policy |
+
+## Task List
+
+### Task 1: Shared Voice Config Extension
+- **Type**: Backend (shared)
+- **File scope**:
+  - `shared/src/voice.ts` (modify — append new exports)
+- **Dependencies**: None
+- **Implementation steps**:
+  1. Add `VoiceBackendType = 'elevenlabs' | 'gemini-live'` and `DEFAULT_VOICE_BACKEND = 'elevenlabs'`
+  2. Add `GEMINI_LIVE_MODEL = 'gemini-3.1-flash-live-preview'` constant
+  3. Extract `VOICE_TOOL_DEFINITIONS` from existing `VOICE_TOOLS` — neutral format, single source of truth
+  4. Add `buildGeminiLiveFunctionDeclarations()` — converts `VOICE_TOOL_DEFINITIONS` to Gemini `{ name, description, parameters }` schema format
+  5. Add `buildGeminiLiveConfig()` — returns `{ model, systemInstruction: VOICE_SYSTEM_PROMPT, tools: [{ functionDeclarations }], responseModalities: ['AUDIO'] }` for `ai.live.connect()`
+  6. Keep `buildVoiceAgentConfig()` untouched for ElevenLabs
+- **Acceptance**: Both config builders produce valid configs; existing ElevenLabs flow unaffected; `VOICE_TOOL_DEFINITIONS` is the single source for both backends
+
+### Task 2: Hub Backend Discovery + Token Route
+- **Type**: Backend (hub)
+- **File scope**:
+  - `hub/src/web/routes/voice.ts` (modify — add routes, refactor handler)
+  - `hub/package.json` (modify — add `@google/genai`)
+- **Dependencies**: Task 1
+- **Implementation steps**:
+  1. Add `resolveVoiceBackend()`: reads `VOICE_BACKEND` env, validates against `VoiceBackendType`, defaults to `elevenlabs`
+  2. Add `GET /voice/backend` route:
+     - Success: `{ allowed: true, backend: VoiceBackendType }`
+     - Failure (missing key): `{ allowed: false, backend: VoiceBackendType, code: 'missing_elevenlabs_api_key' | 'missing_gemini_api_key', error: string }`
+     - Validates that the required API key exists for the configured backend
+  3. Add `issueGeminiLiveToken()`:
+     - Read `GEMINI_API_KEY ?? GOOGLE_API_KEY`; if missing, return `{ allowed: false, code: 'missing_gemini_api_key' }`
+     - Use `@google/genai` SDK to create ephemeral token
+     - Return `{ allowed: true, backend: 'gemini-live', token, model: GEMINI_LIVE_MODEL, apiVersion: 'v1alpha', expiresAt: number }`
+     - Never cache Gemini tokens (they expire in ~60s)
+  4. Refactor `POST /voice/token` handler:
+     - Branch on `resolveVoiceBackend()` — `elevenlabs` uses existing logic unchanged, `gemini-live` calls `issueGeminiLiveToken()`
+     - Discriminated union response type
+  5. Error contract: all failure responses use `{ allowed: false, backend, code, error }` shape with appropriate HTTP status codes
+  6. Add `@google/genai` to `hub/package.json`
+- **Acceptance**: `GET /voice/backend` returns correct backend + allowed status; `POST /voice/token` returns valid token with `expiresAt` for Gemini; missing API key returns structured error; ElevenLabs path unchanged
+
+### Task 3: Web API Types + Client Functions
+- **Type**: Frontend (web)
+- **File scope**:
+  - `web/src/api/voice.ts` (modify — add types and fetch functions)
+  - `web/src/api/client.ts` (modify — add fetchVoiceBackend method)
+- **Dependencies**: Task 2
+- **Implementation steps**:
+  1. Add `VoiceBackendResponse` type:
+     ```ts
+     | { allowed: true; backend: VoiceBackendType }
+     | { allowed: false; backend: VoiceBackendType; code: string; error: string }
+     ```
+  2. Extend `VoiceTokenResponse` as discriminated union:
+     ```ts
+     | { allowed: true; backend: 'elevenlabs'; token: string; agentId: string }
+     | { allowed: true; backend: 'gemini-live'; token: string; model: string; apiVersion: string; expiresAt: number }
+     | { allowed: false; backend: string; code: string; error: string }
+     ```
+  3. Add `fetchVoiceBackend(api)` function with module-level cache (cache only successful responses; invalidate on error)
+  4. Add `fetchVoiceBackend()` method to `ApiClient` class
+  5. Update `fetchVoiceToken()` to handle union response
+- **Acceptance**: Type-safe API calls for both backends; cached backend discovery; failed responses not cached
+
+### Task 4: Gemini Audio Pipeline
+- **Type**: Frontend (web)
+- **File scope** (all new files):
+  - `web/src/realtime/gemini/pcmUtils.ts`
+  - `web/src/realtime/gemini/pcm-recorder.worklet.ts`
+  - `web/src/realtime/gemini/audioRecorder.ts`
+  - `web/src/realtime/gemini/audioPlayer.ts`
+- **Dependencies**: None (can parallel with Task 1-3)
+- **Implementation steps**:
+  1. `pcmUtils.ts`: Pure utility functions:
+     - `float32ToPcm16(samples: Float32Array): ArrayBuffer`
+     - `pcm16ToFloat32(buffer: ArrayBuffer): Float32Array`
+     - `arrayBufferToBase64(buffer: ArrayBuffer): string`
+     - `base64ToArrayBuffer(base64: string): ArrayBuffer`
+  2. `pcm-recorder.worklet.ts`: AudioWorklet processor:
+     - Extends `AudioWorkletProcessor`
+     - `process()` method: accumulate Float32 samples into chunks (e.g., 4096 samples), post to main thread via `port.postMessage()`
+     - Register as `'pcm-recorder-processor'`
+     - Must be loadable via Vite: `import workletUrl from './pcm-recorder.worklet.ts?url'`
+  3. `audioRecorder.ts`: class `GeminiAudioRecorder`:
+     - `start(onChunk: (base64Pcm: string) => void)`:
+       - `getUserMedia({ audio: { sampleRate: 16000, channelCount: 1 } })`
+       - Create `AudioContext({ sampleRate: 16000 })`
+       - `audioContext.audioWorklet.addModule(workletUrl)`
+       - Connect MediaStreamSource → AudioWorkletNode
+       - Worklet messages → `float32ToPcm16()` → `arrayBufferToBase64()` → `onChunk()`
+     - `stop()`: stop all tracks, disconnect nodes, close AudioContext
+     - `setMuted(muted: boolean)`: toggle `MediaStreamTrack.enabled`
+     - `dispose()`: idempotent full cleanup, safe to call multiple times
+     - Listen for `MediaStreamTrack.onended` (device unplugged) → invoke error callback
+     - **Fallback**: if `audioWorklet.addModule()` fails, fall back to `ScriptProcessorNode` (deprecated but wider support)
+  4. `audioPlayer.ts`: class `GeminiAudioPlayer`:
+     - `constructor(audioContext?: AudioContext)`: use provided AudioContext or create new at 24kHz; maintain playback queue with scheduled end times
+     - `enqueue(base64Pcm: string)`: decode → create `AudioBufferSourceNode` → schedule at `max(audioContext.currentTime, lastEndTime)` → update `lastEndTime`
+     - `clearQueue()`: stop all scheduled sources immediately (for barge-in); reset `lastEndTime`
+     - `isPlaying(): boolean`: check if audio is currently being output
+     - `dispose()`: stop all, close AudioContext if we own it
+     - Handle Chrome tab backgrounding: detect `audioContext.state === 'suspended'` → attempt `resume()` → if blocked, notify via callback
+- **Acceptance**: Recorder produces 16kHz PCM16 base64 chunks; Player plays 24kHz PCM16 smoothly without clicks; clearQueue stops immediately; device unplug detected; fallback for no-AudioWorklet browsers
+
+### Task 5: Gemini Tool Adapter
+- **Type**: Frontend (web)
+- **File scope** (new file):
+  - `web/src/realtime/gemini/toolAdapter.ts`
+- **Dependencies**: Task 1 (for VOICE_TOOL_DEFINITIONS)
+- **Implementation steps**:
+  1. `getGeminiFunctionDeclarations()`: import `VOICE_TOOL_DEFINITIONS` from shared (single source of truth), map to Gemini schema format — no separate declaration, no schema drift risk
+  2. `handleGeminiToolCalls(functionCalls, clientTools)`:
+     - Process calls **serially** (one at a time, in order)
+     - For each call: lookup function name in `realtimeClientTools`, execute with args, collect result
+     - **Preserve call IDs**: each `FunctionResponse` must include the matching `id` from the `FunctionCall`
+     - **Per-call timeout**: wrap each execution in a 30s timeout; return `'error (timeout)'` on expiry
+     - **Error isolation**: tool failure returns error string as response, never throws, never crashes session
+     - Return `FunctionResponse[]` array
+  3. `validateToolArgs(name: string, args: unknown): boolean`: basic validation that required params exist
+- **Acceptance**: Function declarations derived from single source; tool calls route correctly; call IDs preserved in responses; timeout works; errors don't crash session
+
+### Task 6: GeminiLiveVoiceSession Implementation
+- **Type**: Frontend (web)
+- **File scope** (new file):
+  - `web/src/realtime/GeminiLiveVoiceSession.tsx`
+  - `web/package.json` (modify — add `@google/genai`)
+- **Dependencies**: Task 3, Task 4, Task 5
+- **Implementation steps**:
+  1. Create `GeminiLiveVoiceSessionImpl` class implementing `VoiceSession` interface:
+     - **`startSession(config)`**:
+       - Fetch token from hub via `fetchVoiceToken(api)`
+       - Build config via `buildGeminiLiveConfig()` from shared
+       - Call `ai.live.connect({ model, config, callbacks })` with ephemeral token
+       - Start audio recorder → pipe chunks to live session via `sendRealtimeInput()`
+       - Seed initial context via `session.sendClientContent()` (one-time)
+       - Set status 'connected'
+     - **`endSession()`**: call `dispose()` (see below)
+     - **`sendTextMessage(message)`**: send as realtime text input to live session
+     - **`sendContextualUpdate(update)`**: send as realtime text input with `[CONTEXT UPDATE] ` prefix
+     - **`dispose(reason?: string)`**: single idempotent teardown path:
+       - Stop recorder (releases mic)
+       - Clear + dispose player
+       - Close live session WebSocket
+       - Reset all internal state
+       - Safe to call from any failure branch, unmount, session switch, or error
+     - **Reconnect logic**:
+       - On WebSocket close/error: if `reason !== 'user-initiated'`, attempt reconnect
+       - Fetch fresh token from hub (old one expired)
+       - Recreate live session with new token
+       - Reseed context via `sendClientContent()`
+       - Max 3 reconnect attempts with exponential backoff (1s, 3s, 9s)
+       - After 3 failures: set status 'error', show error in VoiceErrorBanner
+  2. Create `GeminiLiveVoiceSession` React component:
+     - Props: same as `RealtimeVoiceSessionProps`
+     - **On mount**: instantiate impl, register via `registerVoiceSession()`, register session store
+     - **useEffect cleanup**: call `dispose('unmount')` — handles React Strict Mode double-mount correctly
+     - Handle `micMuted` prop: delegate to `recorder.setMuted()` — if recorder not yet started, store as pending state applied on recorder start
+     - Wire live session callbacks:
+       - `onopen` → status 'connected'
+       - `onclose` → attempt reconnect or status 'disconnected'
+       - `onerror` → status 'error' with message
+       - `onmessage`: dispatch by type:
+         - Audio data → `player.enqueue(base64)`
+         - Tool call → `toolAdapter.handleGeminiToolCalls()` → `session.sendToolResponse()`
+         - Text → log/ignore (voice session doesn't render text)
+     - **Barge-in**: when server signals user is speaking (or audio input detected while player active) → `player.clearQueue()`
+     - **AudioContext creation**: create AudioContext **synchronously in startSession**, which is called from user click handler → satisfies mobile autoplay policy
+     - Share AudioContext between recorder and player where sample rates allow (otherwise separate contexts)
+     - Render nothing (same as ElevenLabs version)
+  3. Add `@google/genai` to `web/package.json`
+- **Acceptance**: Full voice conversation works; tool calls execute correctly with preserved IDs; mic mute works (including pending state); barge-in clears playback; reconnect works on token expiry/WebSocket drop; dispose is idempotent; no resource leaks on unmount; works on mobile (AudioContext in gesture)
+
+### Task 7: Voice Backend Switcher + Integration
+- **Type**: Frontend (web)
+- **File scope**:
+  - `web/src/realtime/VoiceBackendSession.tsx` (new)
+  - `web/src/realtime/index.ts` (modify — add export)
+  - `web/src/components/SessionChat.tsx` (modify — change import + JSX, add auto-stop)
+  - `web/src/lib/voice-context.tsx` (modify — add voiceReady state)
+- **Dependencies**: Task 6
+- **Implementation steps**:
+  1. Create `VoiceBackendSession.tsx`:
+     - Props: same as `RealtimeVoiceSessionProps` + `api: ApiClient`
+     - On mount: call `fetchVoiceBackend(api)` (cached), store result in state
+     - Render:
+       - Loading (no backend yet): return null
+       - `backend === 'gemini-live'` → `React.lazy(() => import('./GeminiLiveVoiceSession'))` wrapped in `<Suspense>`
+       - Default → `<RealtimeVoiceSession {...props} />`
+       - `allowed === false` → return null (voice not available)
+  2. Update `web/src/lib/voice-context.tsx`:
+     - Add `voiceReady: boolean` to context (default false)
+     - Set `voiceReady = true` after backend discovery completes with `allowed: true`
+     - Expose `voiceReady` in `useVoice()` return
+     - Voice button disabled until `voiceReady === true`
+  3. Update `web/src/components/SessionChat.tsx`:
+     - Change import: `RealtimeVoiceSession` → `VoiceBackendSession`
+     - Change JSX: `<RealtimeVoiceSession` → `<VoiceBackendSession`
+     - Add auto-stop on session switch: when `props.session.id` changes while voice is active → call `stopVoice()`
+     - No other changes to SessionChat logic
+  4. Update `web/src/realtime/index.ts`: export `VoiceBackendSession`
+- **Acceptance**: `VOICE_BACKEND=gemini-live` switches to Gemini; default uses ElevenLabs; UI behavior identical; voice button disabled during loading; auto-stop on session switch; Gemini code lazy-loaded (no bundle impact for ElevenLabs users)
+
+### Task 8: Tests
+- **Type**: Full-stack
+- **File scope** (all new files):
+  - `hub/src/web/routes/voice.test.ts`
+  - `web/src/realtime/gemini/pcmUtils.test.ts`
+  - `web/src/realtime/gemini/toolAdapter.test.ts`
+  - `web/src/api/voice.test.ts`
+- **Dependencies**: Task 7
+- **Implementation steps**:
+  1. Hub route tests (`voice.test.ts`):
+     - `GET /voice/backend` returns correct backend for each `VOICE_BACKEND` value
+     - `GET /voice/backend` returns `allowed: false` when API key missing
+     - `POST /voice/token` returns ElevenLabs token shape when backend=elevenlabs
+     - `POST /voice/token` returns Gemini token shape with expiresAt when backend=gemini-live
+     - Error contract: all failures return `{ allowed, backend, code, error }`
+  2. PCM utils tests:
+     - Round-trip: `pcm16ToFloat32(float32ToPcm16(samples))` ≈ original (within quantization error)
+     - Round-trip: `base64ToArrayBuffer(arrayBufferToBase64(buf))` === original
+     - Edge cases: empty array, single sample, max int16 values
+  3. Tool adapter tests:
+     - `getGeminiFunctionDeclarations()` matches expected schema shape
+     - `handleGeminiToolCalls()` routes to correct client tool
+     - Call ID preserved in response
+     - Timeout triggers error response (not throw)
+     - Unknown tool name returns error response
+  4. Web API voice tests:
+     - `fetchVoiceBackend()` caches successful responses
+     - `fetchVoiceBackend()` does not cache failures
+     - `fetchVoiceToken()` returns correct union variant
+- **Acceptance**: All tests pass; covers both backend paths; edge cases handled
+
+### Task 9: Documentation + Settings
+- **Type**: Frontend + Docs
+- **File scope**:
+  - `docs/guide/voice-assistant.md` (modify)
+  - `web/src/routes/settings/index.tsx` (modify — conditional rendering)
+- **Dependencies**: Task 7
+- **Implementation steps**:
+  1. Update `docs/guide/voice-assistant.md`:
+     - Add "Backend Selection" section: `VOICE_BACKEND` env var, supported values, default
+     - Add Gemini Live setup: `GEMINI_API_KEY` configuration
+     - Keep existing ElevenLabs docs intact
+     - Add comparison table (ElevenLabs vs Gemini Live)
+     - Add troubleshooting: common Gemini errors, token expiry, mobile issues
+  2. Update Settings page:
+     - Fetch active backend via `fetchVoiceBackend()`
+     - When `backend === 'gemini-live'`: hide voice language selector (Gemini ignores it)
+     - Optionally show "Voice Backend: Gemini Live" indicator
+- **Acceptance**: Docs cover both backends; Settings page doesn't show irrelevant options
+
+## File Conflict Check
+
+| File | Operation | Upstream Conflict Risk |
+|------|-----------|----------------------|
+| `shared/src/voice.ts` | Modify (append only) | Low — only adding exports |
+| `hub/src/web/routes/voice.ts` | Modify (refactor handler) | Medium — actively maintained |
+| `hub/package.json` | Modify (add dep) | Low |
+| `web/src/api/voice.ts` | Modify (add types + function) | Low — append only |
+| `web/src/api/client.ts` | Modify (add 1 method) | Low — append only |
+| `web/src/lib/voice-context.tsx` | Modify (add voiceReady) | Low — add field |
+| `web/src/components/SessionChat.tsx` | Modify (import swap + auto-stop) | Low-Medium |
+| `web/src/realtime/index.ts` | Modify (add 1 export) | Low — append only |
+| `web/src/routes/settings/index.tsx` | Modify (conditional render) | Low |
+| `web/package.json` | Modify (add dep) | Low |
+| `docs/guide/voice-assistant.md` | Modify (add sections) | Low |
+| `web/src/realtime/GeminiLiveVoiceSession.tsx` | **New** | None |
+| `web/src/realtime/VoiceBackendSession.tsx` | **New** | None |
+| `web/src/realtime/gemini/audioRecorder.ts` | **New** | None |
+| `web/src/realtime/gemini/audioPlayer.ts` | **New** | None |
+| `web/src/realtime/gemini/pcmUtils.ts` | **New** | None |
+| `web/src/realtime/gemini/pcm-recorder.worklet.ts` | **New** | None |
+| `web/src/realtime/gemini/toolAdapter.ts` | **New** | None |
+| `hub/src/web/routes/voice.test.ts` | **New** | None |
+| `web/src/realtime/gemini/pcmUtils.test.ts` | **New** | None |
+| `web/src/realtime/gemini/toolAdapter.test.ts` | **New** | None |
+| `web/src/api/voice.test.ts` | **New** | None |
+
+**Result**: 11 new files (zero conflict), 11 modified files (mostly append-only, 1-2 medium risk)
+
+## Parallel Grouping
+
+```
+Layer 1 (parallel — 2 Builders):
+  Task 1: shared config extension (no deps)
+  Task 4: audio pipeline incl. worklet (no deps, pure utility)
+
+Layer 2 (parallel — 2 Builders):
+  Task 2: hub routes + token (depends on Task 1 types)
+  Task 5: tool adapter (depends on Task 1 declarations)
+
+Layer 3 (sequential):
+  Task 3: web API types (depends on Task 2 response types)
+
+Layer 4 (sequential):
+  Task 6: Gemini session impl (depends on Task 3, 4, 5)
+
+Layer 5 (sequential):
+  Task 7: switcher + integration (depends on Task 6)
+
+Layer 6 (parallel — 2 Builders):
+  Task 8: tests (depends on Task 7)
+  Task 9: docs + settings (depends on Task 7)
+```
+
+## Risk Matrix
+
+| Risk | Severity | Mitigation |
+|------|----------|------------|
+| `sendContextualUpdate` semantic mismatch | High | Use `send_realtime_input` with `[CONTEXT UPDATE]` prefix; reduce voiceHooks noise |
+| API key leaked to browser | High | Hub issues ephemeral token only; never pass long-lived key |
+| Token expiry mid-session | High | Auto-reconnect with fresh token; max 3 retries; expiresAt in response |
+| Session switch wrong routing | High | Auto-stop voice on session change in SessionChat |
+| AudioWorklet browser compat | Medium | Fallback to ScriptProcessorNode if addModule() fails |
+| Mobile autoplay blocked | Medium | Create AudioContext synchronously in user gesture; resume() before async ops |
+| Audio pipeline complexity | Medium | Isolate in `gemini/` subdirectory; test recorder/player independently |
+| Gemini Live API is preview | Medium | Centralize model name + API version in shared config; easy to update |
+| Tool calling round-trip blocking | Medium | Serial execution; 30s per-call timeout; error isolation |
+| React Strict Mode double-mount | Medium | useEffect cleanup calls dispose('unmount') |
+| Chrome tab backgrounding | Low | Detect suspended AudioContext; attempt resume(); notify on failure |
+| `hub/src/web/routes/voice.ts` merge conflict | Medium | Minimize structural changes; keep ElevenLabs path identical |
+
+## Environment Variables
+
+| Variable | Backend | Required | Default |
+|----------|---------|----------|---------|
+| `VOICE_BACKEND` | Both | No | `elevenlabs` |
+| `ELEVENLABS_API_KEY` | ElevenLabs | When backend=elevenlabs | — |
+| `ELEVENLABS_AGENT_ID` | ElevenLabs | No | Auto-created |
+| `GEMINI_API_KEY` | Gemini Live | When backend=gemini-live | Falls back to `GOOGLE_API_KEY` |
+
+## SESSION_ID (for /ccg:execute)
+- CODEX_SESSION: 019d57bd-e452-7c80-8d67-d2b457b50086
+- GEMINI_SESSION: 0ab98e48-a85f-458a-84b3-1e9e3e4a91da
diff --git a/bun.lock b/bun.lock
index f31e993406..c71cac240c 100644
--- a/bun.lock
+++ b/bun.lock
@@ -1062,6 +1062,10 @@
 
     "@twsxtd/hapi-linux-x64": ["@twsxtd/hapi-linux-x64@0.16.7", "", { "os": "linux", "cpu": "x64", "bin": { "hapi": "bin/hapi" } }, "sha512-JuqgwJev9bHg57EqS+pGWXJ5tBtV3Xm5MFmoMNWXLuRVegNrWTO5WJHRsPH5XIItXtam5/aThKy73WEaTde4IA=="],
 
+    "@twsxtd/hapi-linux-x64": ["@twsxtd/hapi-linux-x64@0.16.5", "", { "os": "linux", "cpu": "x64", "bin": { "hapi": "bin/hapi" } }, "sha512-Cdo2B/BCnDJRkkGHxMo8UVVcKXJyuS7bnr+JtJL4f6kp7o8T2T9o/85S05Z/TF8k5dAxz2Np8rpJgmhBd/1boA=="],
+
+    "@twsxtd/hapi-win32-x64": ["@twsxtd/hapi-win32-x64@0.16.5", "", { "os": "win32", "cpu": "x64", "bin": { "hapi": "bin/hapi.exe" } }, "sha512-jnl5zxT2AIslIy2X9jYklTUIob/Om8RlPIx2QbobkYP5+mMpPWUpJsBFx1NdN8FDMApcVQ/tqh6v2d72m16rDA=="],
+
     "@types/aria-query": ["@types/aria-query@5.0.4", "", {}, "sha512-rfT93uj5s0PRL7EzccGMs3brplhcrghnDoV26NqKhCAS1hVo+WdNsPvE/yb6ilfr5hi2MEk6d5EWJTKdxg8jVw=="],
 
     "@types/babel__core": ["@types/babel__core@7.20.5", "", { "dependencies": { "@babel/parser": "^7.20.7", "@babel/types": "^7.20.7", "@types/babel__generator": "*", "@types/babel__template": "*", "@types/babel__traverse": "*" } }, "sha512-qoQprZvz5wQFJwMDqeseRXWv3rqMvhgpbXFfVyWhbx9X47POIA6i/+dXefEmZKoAgOaTdaIgNSMqMIU61yRyzA=="],
diff --git a/hub/src/socket/server.ts b/hub/src/socket/server.ts
index 19086a95b4..ce3639dd95 100644
--- a/hub/src/socket/server.ts
+++ b/hub/src/socket/server.ts
@@ -63,6 +63,7 @@ export function createSocketServer(deps: SocketServerDeps): {
     const engine = new Engine({
         path: '/socket.io/',
         cors: corsOptions,
+        maxHttpBufferSize: 55 * 1024 * 1024, // 55MB to match upload limit
         allowRequest: async (req) => {
             const origin = req.headers.get('origin')
             if (!origin || allowAllOrigins || corsOrigins.includes(origin)) {
diff --git a/hub/src/web/routes/voice.test.ts b/hub/src/web/routes/voice.test.ts
new file mode 100644
index 0000000000..f2eb2444ba
--- /dev/null
+++ b/hub/src/web/routes/voice.test.ts
@@ -0,0 +1,152 @@
+import { describe, test, expect, afterEach } from 'bun:test'
+import { Hono } from 'hono'
+import type { WebAppEnv } from '../middleware/auth'
+import { createVoiceRoutes } from './voice'
+
+function createApp() {
+    const app = new Hono<WebAppEnv>()
+    app.route('/api', createVoiceRoutes())
+    return app
+}
+
+describe('GET /api/voice/backend', () => {
+    const originalEnv = process.env.VOICE_BACKEND
+
+    afterEach(() => {
+        if (originalEnv === undefined) {
+            delete process.env.VOICE_BACKEND
+        } else {
+            process.env.VOICE_BACKEND = originalEnv
+        }
+    })
+
+    test('returns elevenlabs by default', async () => {
+        delete process.env.VOICE_BACKEND
+        const app = createApp()
+        const res = await app.request('/api/voice/backend')
+        expect(res.status).toBe(200)
+        const body = await res.json() as { backend: string }
+        expect(body.backend).toBe('elevenlabs')
+    })
+
+    test('returns gemini-live when configured', async () => {
+        process.env.VOICE_BACKEND = 'gemini-live'
+        const app = createApp()
+        const res = await app.request('/api/voice/backend')
+        expect(res.status).toBe(200)
+        const body = await res.json() as { backend: string }
+        expect(body.backend).toBe('gemini-live')
+    })
+
+    test('returns qwen-realtime when configured', async () => {
+        process.env.VOICE_BACKEND = 'qwen-realtime'
+        const app = createApp()
+        const res = await app.request('/api/voice/backend')
+        expect(res.status).toBe(200)
+        const body = await res.json() as { backend: string }
+        expect(body.backend).toBe('qwen-realtime')
+    })
+
+    test('falls back to elevenlabs for unknown values', async () => {
+        process.env.VOICE_BACKEND = 'unknown-backend'
+        const app = createApp()
+        const res = await app.request('/api/voice/backend')
+        expect(res.status).toBe(200)
+        const body = await res.json() as { backend: string }
+        expect(body.backend).toBe('elevenlabs')
+    })
+})
+
+describe('POST /api/voice/gemini-token', () => {
+    const origGemini = process.env.GEMINI_API_KEY
+    const origGoogle = process.env.GOOGLE_API_KEY
+
+    afterEach(() => {
+        if (origGemini === undefined) delete process.env.GEMINI_API_KEY
+        else process.env.GEMINI_API_KEY = origGemini
+        if (origGoogle === undefined) delete process.env.GOOGLE_API_KEY
+        else process.env.GOOGLE_API_KEY = origGoogle
+    })
+
+    test('returns 400 when no API key configured', async () => {
+        delete process.env.GEMINI_API_KEY
+        delete process.env.GOOGLE_API_KEY
+        const app = createApp()
+        const res = await app.request('/api/voice/gemini-token', { method: 'POST' })
+        expect(res.status).toBe(400)
+        const body = await res.json() as { allowed: boolean; error: string }
+        expect(body.allowed).toBe(false)
+        expect(body.error).toContain('not configured')
+    })
+
+    test('returns proxied wsUrl when GEMINI_API_KEY is set', async () => {
+        process.env.GEMINI_API_KEY = 'test-gemini-key'
+        delete process.env.GOOGLE_API_KEY
+        const app = createApp()
+        const res = await app.request('/api/voice/gemini-token', { method: 'POST' })
+        expect(res.status).toBe(200)
+        const body = await res.json() as { allowed: boolean; apiKey: string; wsUrl: string }
+        expect(body.allowed).toBe(true)
+        expect(body.apiKey).toBe('proxied')
+        expect(body.wsUrl).toContain('/api/voice/gemini-ws')
+    })
+
+    test('falls back to GOOGLE_API_KEY', async () => {
+        delete process.env.GEMINI_API_KEY
+        process.env.GOOGLE_API_KEY = 'test-google-key'
+        const app = createApp()
+        const res = await app.request('/api/voice/gemini-token', { method: 'POST' })
+        expect(res.status).toBe(200)
+        const body = await res.json() as { allowed: boolean; apiKey: string; wsUrl: string }
+        expect(body.allowed).toBe(true)
+        expect(body.apiKey).toBe('proxied')
+        expect(body.wsUrl).toContain('/api/voice/gemini-ws')
+    })
+})
+
+describe('POST /api/voice/qwen-token', () => {
+    const origDash = process.env.DASHSCOPE_API_KEY
+    const origQwen = process.env.QWEN_API_KEY
+
+    afterEach(() => {
+        if (origDash === undefined) delete process.env.DASHSCOPE_API_KEY
+        else process.env.DASHSCOPE_API_KEY = origDash
+        if (origQwen === undefined) delete process.env.QWEN_API_KEY
+        else process.env.QWEN_API_KEY = origQwen
+    })
+
+    test('returns 400 when no API key configured', async () => {
+        delete process.env.DASHSCOPE_API_KEY
+        delete process.env.QWEN_API_KEY
+        const app = createApp()
+        const res = await app.request('/api/voice/qwen-token', { method: 'POST' })
+        expect(res.status).toBe(400)
+        const body = await res.json() as { allowed: boolean; error: string }
+        expect(body.allowed).toBe(false)
+        expect(body.error).toContain('not configured')
+    })
+
+    test('returns wsUrl when DASHSCOPE_API_KEY is set (no raw key exposed)', async () => {
+        process.env.DASHSCOPE_API_KEY = 'test-dash-key'
+        delete process.env.QWEN_API_KEY
+        const app = createApp()
+        const res = await app.request('/api/voice/qwen-token', { method: 'POST' })
+        expect(res.status).toBe(200)
+        const body = await res.json() as { allowed: boolean; wsUrl: string }
+        expect(body.allowed).toBe(true)
+        expect(body.wsUrl).toContain('/api/voice/qwen-ws')
+        expect(body).not.toHaveProperty('apiKey')
+    })
+
+    test('falls back to QWEN_API_KEY', async () => {
+        delete process.env.DASHSCOPE_API_KEY
+        process.env.QWEN_API_KEY = 'test-qwen-key'
+        const app = createApp()
+        const res = await app.request('/api/voice/qwen-token', { method: 'POST' })
+        expect(res.status).toBe(200)
+        const body = await res.json() as { allowed: boolean; wsUrl: string }
+        expect(body.allowed).toBe(true)
+        expect(body.wsUrl).toContain('/api/voice/qwen-ws')
+        expect(body).not.toHaveProperty('apiKey')
+    })
+})
diff --git a/hub/src/web/routes/voice.ts b/hub/src/web/routes/voice.ts
index 1a55f83639..a0863fa45e 100644
--- a/hub/src/web/routes/voice.ts
+++ b/hub/src/web/routes/voice.ts
@@ -4,8 +4,10 @@ import type { WebAppEnv } from '../middleware/auth'
 import {
     ELEVENLABS_API_BASE,
     VOICE_AGENT_NAME,
-    buildVoiceAgentConfig
+    buildVoiceAgentConfig,
+    DEFAULT_VOICE_BACKEND
 } from '@hapi/protocol/voice'
+import type { VoiceBackendType } from '@hapi/protocol/voice'
 
 const tokenRequestSchema = z.object({
     customAgentId: z.string().optional(),
@@ -116,6 +118,65 @@ async function getOrCreateAgentId(apiKey: string): Promise<string | null> {
 export function createVoiceRoutes(): Hono<WebAppEnv> {
     const app = new Hono<WebAppEnv>()
 
+    // Return the configured voice backend type
+    app.get('/voice/backend', (c) => {
+        const raw = process.env.VOICE_BACKEND
+        const backend: VoiceBackendType =
+            raw === 'gemini-live' ? 'gemini-live'
+            : raw === 'qwen-realtime' ? 'qwen-realtime'
+            : DEFAULT_VOICE_BACKEND
+        return c.json({ backend })
+    })
+
+    // Get Gemini API key for Gemini Live voice sessions
+    // Gemini Live API does not support ephemeral tokens, so we proxy the key.
+    // The key is short-lived in the browser session and never persisted client-side.
+    app.post('/voice/gemini-token', async (c) => {
+        const apiKey = process.env.GEMINI_API_KEY || process.env.GOOGLE_API_KEY
+        if (!apiKey) {
+            return c.json({
+                allowed: false,
+                error: 'Gemini API key not configured (set GEMINI_API_KEY or GOOGLE_API_KEY)'
+            }, 400)
+        }
+
+        // Use server-side WS proxy to avoid region restrictions.
+        // The proxy at /api/voice/gemini-ws handles the API key server-side.
+        // Derive wsUrl from the request origin so remote browsers connect back to the hub,
+        // not to localhost. HAPI_PUBLIC_URL overrides when set (e.g. behind a reverse proxy).
+        const requestOrigin = new URL(c.req.url).origin
+        const publicUrl = process.env.HAPI_PUBLIC_URL || requestOrigin
+        const wsProxyUrl = publicUrl.replace(/^http/, 'ws') + '/api/voice/gemini-ws'
+
+        return c.json({
+            allowed: true,
+            apiKey: 'proxied', // Dummy — key is handled server-side
+            wsUrl: wsProxyUrl, // Always proxy — env WS URLs are upstream-only (server-side)
+            baseUrl: process.env.GEMINI_API_BASE || undefined
+        })
+    })
+
+    // Check Qwen (DashScope) availability for Qwen Realtime voice sessions
+    // The actual API key is never sent to the browser — it stays server-side in the WS proxy.
+    app.post('/voice/qwen-token', async (c) => {
+        const apiKey = process.env.DASHSCOPE_API_KEY || process.env.QWEN_API_KEY
+        if (!apiKey) {
+            return c.json({
+                allowed: false,
+                error: 'DashScope API key not configured (set DASHSCOPE_API_KEY or QWEN_API_KEY)'
+            }, 400)
+        }
+
+        const requestOrigin = new URL(c.req.url).origin
+        const publicUrl = process.env.HAPI_PUBLIC_URL || requestOrigin
+        const wsProxyUrl = publicUrl.replace(/^http/, 'ws') + '/api/voice/qwen-ws'
+
+        return c.json({
+            allowed: true,
+            wsUrl: wsProxyUrl // Always proxy — env WS URLs are upstream-only (server-side)
+        })
+    })
+
     // Get ElevenLabs ConvAI conversation token
     app.post('/voice/token', async (c) => {
         const json = await c.req.json().catch(() => null)
diff --git a/hub/src/web/server.ts b/hub/src/web/server.ts
index b4dbf4eb5e..3272902be6 100644
--- a/hub/src/web/server.ts
+++ b/hub/src/web/server.ts
@@ -21,8 +21,122 @@ import { createPushRoutes } from './routes/push'
 import { createVoiceRoutes } from './routes/voice'
 import type { SSEManager } from '../sse/sseManager'
 import type { VisibilityTracker } from '../visibility/visibilityTracker'
-import type { Server as BunServer } from 'bun'
+import type { Server as BunServer, ServerWebSocket } from 'bun'
 import type { Server as SocketEngine } from '@socket.io/bun-engine'
+import { jwtVerify } from 'jose'
+
+// Gemini Live WebSocket proxy — relays browser WS to Google, bypassing region restrictions
+function createGeminiProxyWebSocketHandler() {
+    const GEMINI_WS_BASE = 'wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent'
+    const upstreamMap = new WeakMap<ServerWebSocket<unknown>, WebSocket>()
+    const pendingMap = new WeakMap<ServerWebSocket<unknown>, Array<string | ArrayBuffer | Uint8Array>>()
+
+    return {
+        open(clientWs: ServerWebSocket<unknown>) {
+            const data = clientWs.data as { _geminiProxy: boolean; apiKey: string }
+            const upstreamUrl = `${process.env.GEMINI_LIVE_WS_URL || GEMINI_WS_BASE}?key=${encodeURIComponent(data.apiKey)}`
+            const pending: Array<string | ArrayBuffer | Uint8Array> = []
+            pendingMap.set(clientWs, pending)
+
+            const upstream = new WebSocket(upstreamUrl)
+            upstreamMap.set(clientWs, upstream)
+
+            upstream.onopen = () => {
+                // Flush any messages queued while upstream was connecting (e.g. setup frame)
+                for (const queued of pending.splice(0)) {
+                    upstream.send(typeof queued === 'string' ? queued : queued)
+                }
+                pendingMap.delete(clientWs)
+            }
+            upstream.onmessage = (event) => {
+                try {
+                    if (clientWs.readyState === 1) {
+                        clientWs.send(typeof event.data === 'string' ? event.data : new Uint8Array(event.data as ArrayBuffer))
+                    }
+                } catch { /* client gone */ }
+            }
+            upstream.onerror = () => {
+                pendingMap.delete(clientWs)
+                try { clientWs.close(1011, 'Upstream error') } catch { /* */ }
+            }
+            upstream.onclose = (event) => {
+                pendingMap.delete(clientWs)
+                try { clientWs.close(event.code, event.reason) } catch { /* */ }
+                upstreamMap.delete(clientWs)
+            }
+        },
+        message(clientWs: ServerWebSocket<unknown>, message: string | ArrayBuffer | Uint8Array) {
+            const upstream = upstreamMap.get(clientWs)
+            if (upstream?.readyState === WebSocket.OPEN) {
+                upstream.send(typeof message === 'string' ? message : message)
+            } else if (upstream?.readyState === WebSocket.CONNECTING) {
+                // Queue messages until upstream opens (critical for the setup frame)
+                const pending = pendingMap.get(clientWs)
+                if (pending) pending.push(message)
+            }
+        },
+        close(clientWs: ServerWebSocket<unknown>, code: number, reason: string) {
+            const upstream = upstreamMap.get(clientWs)
+            pendingMap.delete(clientWs)
+            if (upstream) {
+                try { upstream.close(code, reason) } catch { /* */ }
+                upstreamMap.delete(clientWs)
+            }
+        }
+    }
+}
+
+// Qwen Realtime WebSocket proxy — bridges browser (no custom headers) to DashScope (requires Authorization header)
+function createQwenProxyWebSocketHandler() {
+    const QWEN_WS_BASE = 'wss://dashscope.aliyuncs.com/api-ws/v1/realtime'
+    // Map browser WS → upstream WS
+    const upstreamMap = new WeakMap<ServerWebSocket<unknown>, WebSocket>()
+
+    return {
+        open(clientWs: ServerWebSocket<unknown>) {
+            const data = clientWs.data as { apiKey: string; model: string }
+            const upstreamUrl = `${process.env.QWEN_REALTIME_WS_URL || QWEN_WS_BASE}?model=${encodeURIComponent(data.model)}`
+
+            const upstream = new WebSocket(upstreamUrl, {
+                headers: { 'Authorization': `Bearer ${data.apiKey}` }
+            } as unknown as string[])
+
+            upstreamMap.set(clientWs, upstream)
+
+            upstream.onopen = () => {
+                // Connection ready — upstream will send session.created
+            }
+            upstream.onmessage = (event) => {
+                // Forward upstream → client
+                try {
+                    if (clientWs.readyState === 1) {
+                        clientWs.send(typeof event.data === 'string' ? event.data : new Uint8Array(event.data as ArrayBuffer))
+                    }
+                } catch { /* client gone */ }
+            }
+            upstream.onerror = () => {
+                try { clientWs.close(1011, 'Upstream error') } catch { /* */ }
+            }
+            upstream.onclose = (event) => {
+                try { clientWs.close(event.code, event.reason) } catch { /* */ }
+                upstreamMap.delete(clientWs)
+            }
+        },
+        message(clientWs: ServerWebSocket<unknown>, message: string | ArrayBuffer | Uint8Array) {
+            const upstream = upstreamMap.get(clientWs)
+            if (upstream?.readyState === WebSocket.OPEN) {
+                upstream.send(typeof message === 'string' ? message : message)
+            }
+        },
+        close(clientWs: ServerWebSocket<unknown>, code: number, reason: string) {
+            const upstream = upstreamMap.get(clientWs)
+            if (upstream) {
+                try { upstream.close(code, reason) } catch { /* */ }
+                upstreamMap.delete(clientWs)
+            }
+        }
+    }
+}
 import type { WebSocketData } from '@socket.io/bun-engine'
 import { loadEmbeddedAssetMap, type EmbeddedWebAsset } from './embeddedAssets'
 import { isBunCompiled } from '../utils/bunCompiled'
@@ -230,16 +344,98 @@ export async function startWebServer(options: {
 
     const socketHandler = options.socketEngine.handler()
 
-    const server = Bun.serve({
+    // Wrap socket.io websocket handler to also support Qwen Realtime proxy
+    const originalWsHandler = socketHandler.websocket
+    const geminiProxyHandler = createGeminiProxyWebSocketHandler()
+    const qwenProxyHandler = createQwenProxyWebSocketHandler()
+
+    // eslint-disable-next-line @typescript-eslint/no-explicit-any
+    const server = (Bun.serve as any)({
         hostname: configuration.listenHost,
         port: configuration.listenPort,
         idleTimeout: Math.max(30, socketHandler.idleTimeout),
         maxRequestBodySize: Math.max(socketHandler.maxRequestBodySize, 68 * 1024 * 1024),
-        websocket: socketHandler.websocket,
-        fetch: (req, server) => {
+        websocket: {
+            ...originalWsHandler,
+            open(ws: unknown) {
+                const wsAny = ws as ServerWebSocket<{ _qwenProxy?: boolean; _geminiProxy?: boolean }>
+                if (wsAny.data?._geminiProxy) {
+                    geminiProxyHandler.open(wsAny)
+                } else if (wsAny.data?._qwenProxy) {
+                    qwenProxyHandler.open(wsAny)
+                } else {
+                    originalWsHandler.open?.(ws as never)
+                }
+            },
+            message(ws: unknown, message: unknown) {
+                const wsAny = ws as ServerWebSocket<{ _qwenProxy?: boolean; _geminiProxy?: boolean }>
+                if (wsAny.data?._geminiProxy) {
+                    geminiProxyHandler.message(wsAny, message as string)
+                } else if (wsAny.data?._qwenProxy) {
+                    qwenProxyHandler.message(wsAny, message as string)
+                } else {
+                    originalWsHandler.message?.(ws as never, message as never)
+                }
+            },
+            close(ws: unknown, code: number, reason: string) {
+                const wsAny = ws as ServerWebSocket<{ _qwenProxy?: boolean; _geminiProxy?: boolean }>
+                if (wsAny.data?._geminiProxy) {
+                    geminiProxyHandler.close(wsAny, code, reason)
+                } else if (wsAny.data?._qwenProxy) {
+                    qwenProxyHandler.close(wsAny, code, reason)
+                } else {
+                    originalWsHandler.close?.(ws as never, code as never, reason as never)
+                }
+            }
+        },
+        fetch: async (req: Request, server: { upgrade: (req: Request, opts?: unknown) => boolean }) => {
             const url = new URL(req.url)
             if (url.pathname.startsWith('/socket.io/')) {
-                return socketHandler.fetch(req, server)
+                return socketHandler.fetch(req, server as never)
+            }
+
+            // Voice WebSocket proxies — require JWT auth via query param
+            // (browser WebSocket API cannot set custom headers)
+            if (url.pathname === '/api/voice/gemini-ws' || url.pathname === '/api/voice/qwen-ws') {
+                const token = url.searchParams.get('token')
+                if (!token) {
+                    return new Response('Missing authorization token', { status: 401 })
+                }
+                try {
+                    await jwtVerify(token, options.jwtSecret, { algorithms: ['HS256'] })
+                } catch {
+                    return new Response('Invalid token', { status: 401 })
+                }
+            }
+
+            // Gemini Live WebSocket proxy
+            if (url.pathname === '/api/voice/gemini-ws') {
+                const apiKey = process.env.GEMINI_API_KEY || process.env.GOOGLE_API_KEY
+                if (!apiKey) {
+                    return new Response('Gemini API key not configured', { status: 400 })
+                }
+                const upgraded = (server as unknown as { upgrade: (req: Request, opts: unknown) => boolean }).upgrade(req, {
+                    data: { _geminiProxy: true, apiKey }
+                })
+                if (!upgraded) {
+                    return new Response('WebSocket upgrade failed', { status: 500 })
+                }
+                return undefined as unknown as Response
+            }
+            // Qwen Realtime WebSocket proxy
+            if (url.pathname === '/api/voice/qwen-ws') {
+                const apiKey = process.env.DASHSCOPE_API_KEY || process.env.QWEN_API_KEY
+                const model = url.searchParams.get('model') || 'qwen3.5-omni-plus-realtime'
+                if (!apiKey) {
+                    return new Response('DashScope API key not configured', { status: 400 })
+                }
+                const upgraded = (server as unknown as { upgrade: (req: Request, opts: unknown) => boolean }).upgrade(req, {
+                    data: { _qwenProxy: true, apiKey, model }
+                })
+                if (!upgraded) {
+                    return new Response('WebSocket upgrade failed', { status: 500 })
+                }
+                return undefined as unknown as Response
             }
             return app.fetch(req)
         }
diff --git a/shared/src/voice.ts b/shared/src/voice.ts
index 6751f0eba4..2c2124f876 100644
--- a/shared/src/voice.ts
+++ b/shared/src/voice.ts
@@ -8,7 +8,11 @@
 export const ELEVENLABS_API_BASE = 'https://api.elevenlabs.io/v1'
 export const VOICE_AGENT_NAME = 'Hapi Voice Assistant'
 
-export const VOICE_SYSTEM_PROMPT = `# Identity
+export const VOICE_SYSTEM_PROMPT = `# CRITICAL RULE - Tool Usage
+
+You MUST call the messageCodingAgent tool for ANY request related to coding, files, development, debugging, or tasks for the agent. Do NOT respond verbally to these requests — call the tool FIRST, then briefly confirm. This is your most important behavior.
+
+# Identity
 
 You are Hapi Voice Assistant. You bridge voice communication between users and their AI coding agents in the Hapi ecosystem.
 
@@ -136,9 +140,28 @@ For builds, tests, or large file operations:
 - Treat garbled input as phonetic hints and ask for clarification
 - Correct yourself immediately if you realize you made an error
 - Keep conversations forward-moving with fresh insights
-- Assume a technical software developer audience`
+- Assume a technical software developer audience
+
+# First Interaction
+
+When the user speaks to you for the first time, begin your response with a brief greeting before addressing their request. If their first message is a coding request, greet briefly AND call the tool — do both.`
+
+/**
+ * Additional language block appended to VOICE_SYSTEM_PROMPT for Gemini/Qwen
+ * backends (which don't have a separate language field like ElevenLabs).
+ */
+export const VOICE_CHINESE_LANGUAGE_BLOCK = `
+
+# Language
 
-export const VOICE_FIRST_MESSAGE = "Hey! Hapi here."
+IMPORTANT: Always respond in Chinese (Mandarin). Use natural spoken Chinese.
+- Greet users in Chinese
+- Summarize technical content in Chinese
+- Use English only for proper nouns, tool names, and code identifiers
+- Keep the same warm, concise conversational style in Chinese`
+
+/** ElevenLabs first message — language controlled by ElevenLabs language field */
+export const VOICE_FIRST_MESSAGE = "Hey! Hapi here — what can I help you with?"
 
 export const VOICE_TOOLS = [
     {
@@ -255,3 +278,78 @@ export function buildVoiceAgentConfig(): VoiceAgentConfig {
         }
     }
 }
+
+export type VoiceBackendType = 'elevenlabs' | 'gemini-live' | 'qwen-realtime'
+
+export const QWEN_REALTIME_MODEL = 'qwen3-omni-flash-realtime'
+export const QWEN_REALTIME_VOICE = 'Mia'
+
+export const DEFAULT_VOICE_BACKEND: VoiceBackendType = 'elevenlabs'
+
+export const GEMINI_LIVE_MODEL = 'gemini-2.5-flash-native-audio-latest'
+
+export interface VoiceToolDefinition {
+    name: string
+    description: string
+    parameters: {
+        type: 'object'
+        required: string[]
+        properties: Record<string, {
+            type: string
+            description: string
+        }>
+    }
+}
+
+type VoiceToolSource = Pick<(typeof VOICE_TOOLS)[number], 'name' | 'description' | 'parameters'>
+
+function cloneVoiceToolDefinition(tool: VoiceToolSource): VoiceToolDefinition {
+    const properties: VoiceToolDefinition['parameters']['properties'] = {}
+
+    for (const [key, value] of Object.entries(tool.parameters.properties)) {
+        properties[key] = {
+            type: value.type,
+            description: value.description
+        }
+    }
+
+    return {
+        name: tool.name,
+        description: tool.description,
+        parameters: {
+            type: 'object',
+            required: [...tool.parameters.required],
+            properties
+        }
+    }
+}
+
+export const VOICE_TOOL_DEFINITIONS: VoiceToolDefinition[] = VOICE_TOOLS.map(cloneVoiceToolDefinition)
+
+export type GeminiLiveFunctionDeclaration = VoiceToolDefinition
+
+export interface GeminiLiveConfig {
+    model: string
+    systemInstruction: string
+    tools: Array<{
+        functionDeclarations: GeminiLiveFunctionDeclaration[]
+    }>
+    responseModalities: ['AUDIO']
+}
+
+export function buildGeminiLiveFunctionDeclarations(): GeminiLiveFunctionDeclaration[] {
+    return VOICE_TOOLS.map(cloneVoiceToolDefinition)
+}
+
+export function buildGeminiLiveConfig(): GeminiLiveConfig {
+    return {
+        model: GEMINI_LIVE_MODEL,
+        systemInstruction: VOICE_SYSTEM_PROMPT + VOICE_CHINESE_LANGUAGE_BLOCK,
+        tools: [
+            {
+                functionDeclarations: buildGeminiLiveFunctionDeclarations()
+            }
+        ],
+        responseModalities: ['AUDIO']
+    }
+}
diff --git a/web/src/api/client.ts b/web/src/api/client.ts
index 7f1083c8af..ff76fe89a9 100644
--- a/web/src/api/client.ts
+++ b/web/src/api/client.ts
@@ -443,4 +443,37 @@ export class ApiClient {
             body: JSON.stringify(options || {})
         })
     }
+
+    /** Return the current auth token (for WebSocket query-param auth). */
+    getAuthToken(): string | null {
+        return this.getToken ? this.getToken() : this.token
+    }
+
+    async fetchVoiceBackend(): Promise<{ backend: string }> {
+        return await this.request('/api/voice/backend')
+    }
+
+    async fetchQwenToken(): Promise<{
+        allowed: boolean
+        wsUrl?: string
+        error?: string
+    }> {
+        return await this.request('/api/voice/qwen-token', {
+            method: 'POST',
+            body: JSON.stringify({})
+        })
+    }
+
+    async fetchGeminiToken(): Promise<{
+        allowed: boolean
+        apiKey?: string
+        wsUrl?: string
+        baseUrl?: string
+        error?: string
+    }> {
+        return await this.request('/api/voice/gemini-token', {
+            method: 'POST',
+            body: JSON.stringify({})
+        })
+    }
 }
diff --git a/web/src/api/voice.ts b/web/src/api/voice.ts
index 66cee443f1..3105c4ab95 100644
--- a/web/src/api/voice.ts
+++ b/web/src/api/voice.ts
@@ -15,6 +15,7 @@ import {
     VOICE_AGENT_NAME,
     buildVoiceAgentConfig
 } from '@hapi/protocol/voice'
+import type { VoiceBackendType } from '@hapi/protocol/voice'
 
 export interface VoiceTokenResponse {
     allowed: boolean
@@ -160,3 +161,66 @@ export async function createOrUpdateHapiAgent(apiKey: string): Promise<CreateAge
         return { success: false, error: e instanceof Error ? e.message : 'Network error' }
     }
 }
+
+// --- Pluggable voice backend API ---
+
+export interface QwenTokenResponse {
+    allowed: boolean
+    wsUrl?: string
+    error?: string
+}
+
+/**
+ * Fetch a DashScope API key from the hub for Qwen Realtime voice sessions.
+ */
+export async function fetchQwenToken(api: ApiClient): Promise<QwenTokenResponse> {
+    try {
+        return await api.fetchQwenToken()
+    } catch (error) {
+        return {
+            allowed: false,
+            error: error instanceof Error ? error.message : 'Network error'
+        }
+    }
+}
+
+export interface VoiceBackendResponse {
+    backend: VoiceBackendType
+}
+
+export interface GeminiTokenResponse {
+    allowed: boolean
+    apiKey?: string
+    wsUrl?: string
+    baseUrl?: string
+    error?: string
+}
+
+/**
+ * Discover which voice backend the hub is configured to use.
+ */
+export async function fetchVoiceBackend(api: ApiClient): Promise<VoiceBackendResponse> {
+    try {
+        const result = await api.fetchVoiceBackend()
+        const backend = result.backend === 'gemini-live' ? 'gemini-live'
+            : result.backend === 'qwen-realtime' ? 'qwen-realtime'
+            : 'elevenlabs'
+        return { backend } as VoiceBackendResponse
+    } catch {
+        return { backend: 'elevenlabs' }
+    }
+}
+
+/**
+ * Fetch a Gemini API key from the hub for Gemini Live voice sessions.
+ */
+export async function fetchGeminiToken(api: ApiClient): Promise<GeminiTokenResponse> {
+    try {
+        return await api.fetchGeminiToken()
+    } catch (error) {
+        return {
+            allowed: false,
+            error: error instanceof Error ? error.message : 'Network error'
+        }
+    }
+}
diff --git a/web/src/components/AssistantChat/HappyComposer.tsx b/web/src/components/AssistantChat/HappyComposer.tsx
index 6d0e20d3d5..3168dfe03e 100644
--- a/web/src/components/AssistantChat/HappyComposer.tsx
+++ b/web/src/components/AssistantChat/HappyComposer.tsx
@@ -303,29 +303,29 @@ export function HappyComposer(props: {
             return
         }
 
-        // Shift+Enter inserts a newline (standard behavior)
-        if (key === 'Enter' && e.shiftKey) {
-            return // let default textarea behavior handle newline
-        }
-
         // Enter with suggestions visible: select the suggestion
-        if (key === 'Enter' && suggestions.length > 0) {
+        if (key === 'Enter' && suggestions.length > 0 && !e.ctrlKey && !e.metaKey) {
             e.preventDefault()
             const indexToSelect = selectedIndex >= 0 ? selectedIndex : 0
             handleSuggestionSelect(indexToSelect)
             return
         }
 
-        // Only plain Enter (no modifiers) sends; other modifier combos are ignored
-        if (key === 'Enter') {
+        // Ctrl+Enter (Windows/Linux) or Cmd+Enter (Mac) sends the message
+        if (key === 'Enter' && (e.ctrlKey || e.metaKey)) {
             e.preventDefault()
-            if (!e.ctrlKey && !e.altKey && !e.metaKey && canSend) {
+            if (canSend) {
                 api.composer().send()
                 setShowContinueHint(false)
             }
             return
         }
 
+        // Plain Enter inserts a newline (default textarea behavior)
+        if (key === 'Enter') {
+            return
+        }
+
         if (suggestions.length > 0) {
             if (key === 'ArrowUp') {
                 e.preventDefault()
diff --git a/web/src/components/SessionChat.tsx b/web/src/components/SessionChat.tsx
index 2a60c62b29..40489d809a 100644
--- a/web/src/components/SessionChat.tsx
+++ b/web/src/components/SessionChat.tsx
@@ -27,7 +27,7 @@ import { TeamPanel } from '@/components/TeamPanel'
 import { usePlatform } from '@/hooks/usePlatform'
 import { useSessionActions } from '@/hooks/mutations/useSessionActions'
 import { useVoiceOptional } from '@/lib/voice-context'
-import { RealtimeVoiceSession, registerSessionStore, registerVoiceHooksStore, voiceHooks } from '@/realtime'
+import { VoiceBackendSession, registerSessionStore, registerVoiceHooksStore, voiceHooks } from '@/realtime'
 import { isRemoteTerminalSupported } from '@/utils/terminalSupport'
 
 export function SessionChat(props: {
@@ -80,6 +80,7 @@ export function SessionChat(props: {
 
     // Voice assistant integration
     const voice = useVoiceOptional()
+    const [voiceBackendReady, setVoiceBackendReady] = useState(false)
 
     // Register session store for voice client tools
     useEffect(() => {
@@ -423,18 +424,19 @@ export function SessionChat(props: {
                         autocompleteSuggestions={props.autocompleteSuggestions}
                         voiceStatus={voice?.status}
                         voiceMicMuted={voice?.micMuted}
-                        onVoiceToggle={voice ? handleVoiceToggle : undefined}
-                        onVoiceMicToggle={voice ? handleVoiceMicToggle : undefined}
+                        onVoiceToggle={voice && voiceBackendReady ? handleVoiceToggle : undefined}
+                        onVoiceMicToggle={voice && voiceBackendReady ? handleVoiceMicToggle : undefined}
                     />
                 </div>
             </AssistantRuntimeProvider>
 
-            {/* Voice session component - renders nothing but initializes ElevenLabs */}
+            {/* Voice session component - renders nothing but initializes voice backend */}
             {voice && (
-                <RealtimeVoiceSession
+                <VoiceBackendSession
                     api={props.api}
                     micMuted={voice.micMuted}
                     onStatusChange={voice.setStatus}
+                    onReadyChange={setVoiceBackendReady}
                 />
             )}
         </div>
diff --git a/web/src/realtime/GeminiLiveVoiceSession.tsx b/web/src/realtime/GeminiLiveVoiceSession.tsx
new file mode 100644
index 0000000000..a0461e092f
--- /dev/null
+++ b/web/src/realtime/GeminiLiveVoiceSession.tsx
@@ -0,0 +1,409 @@
+import { useEffect, useRef, useCallback } from 'react'
+import { registerVoiceSession, resetRealtimeSessionState } from './RealtimeSession'
+import { registerSessionStore } from './realtimeClientTools'
+import { fetchGeminiToken } from '@/api/voice'
+import { GeminiAudioRecorder } from './gemini/audioRecorder'
+import { GeminiAudioPlayer } from './gemini/audioPlayer'
+import { handleGeminiFunctionCalls } from './gemini/toolAdapter'
+import { buildGeminiLiveConfig } from '@hapi/protocol/voice'
+import type { VoiceSession, VoiceSessionConfig, StatusCallback } from './types'
+import type { ApiClient } from '@/api/client'
+import type { Session } from '@/types/api'
+import type { GeminiFunctionCall } from './gemini/toolAdapter'
+
+const DEBUG = import.meta.env.DEV
+
+// Default Gemini Live WebSocket API endpoint (Google direct)
+const DEFAULT_GEMINI_LIVE_WS_BASE = 'wss://generativelanguage.googleapis.com/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent'
+
+interface GeminiLiveState {
+    ws: WebSocket | null
+    recorder: GeminiAudioRecorder | null
+    player: GeminiAudioPlayer | null
+    playbackContext: AudioContext | null
+    statusCallback: StatusCallback | null
+    apiKey: string | null
+    wsBaseUrl: string | null
+    modelSpeaking: boolean
+    micMuted: boolean
+}
+
+const state: GeminiLiveState = {
+    ws: null,
+    recorder: null,
+    player: null,
+    playbackContext: null,
+    statusCallback: null,
+    apiKey: null,
+    wsBaseUrl: null,
+    modelSpeaking: false,
+    micMuted: false
+}
+
+function cleanup() {
+    if (state.recorder) {
+        state.recorder.dispose()
+        state.recorder = null
+    }
+    if (state.player) {
+        state.player.dispose()
+        state.player = null
+    }
+    if (state.playbackContext && state.playbackContext.state !== 'closed') {
+        void state.playbackContext.close()
+    }
+    state.playbackContext = null
+    if (state.ws) {
+        if (state.ws.readyState === WebSocket.OPEN || state.ws.readyState === WebSocket.CONNECTING) {
+            state.ws.close()
+        }
+        state.ws = null
+    }
+}
+
+class GeminiLiveVoiceSessionImpl implements VoiceSession {
+    private api: ApiClient
+
+    constructor(api: ApiClient) {
+        this.api = api
+    }
+
+    async startSession(config: VoiceSessionConfig): Promise<void> {
+        cleanup()
+        state.statusCallback?.('connecting')
+
+        // Create playback AudioContext immediately while still inside the user
+        // gesture (click/tap). Mobile browsers require this for autoplay policy.
+        // Store in state so cleanup() can close it on failure or stop.
+        state.playbackContext = new AudioContext({ sampleRate: 24000 })
+        await state.playbackContext.resume()
+
+        // Get API key from hub
+        console.log('[GeminiLive] Fetching token...')
+        const tokenResp = await fetchGeminiToken(this.api)
+        console.log('[GeminiLive] Token response:', { allowed: tokenResp.allowed, hasKey: !!tokenResp.apiKey, error: tokenResp.error })
+        if (!tokenResp.allowed || !tokenResp.apiKey) {
+            const msg = tokenResp.error ?? 'Gemini API key not available'
+            console.error('[GeminiLive] Token failed:', msg)
+            state.statusCallback?.('error', msg)
+            throw new Error(msg)
+        }
+        state.apiKey = tokenResp.apiKey
+        state.wsBaseUrl = tokenResp.wsUrl || null
+
+        // Request microphone
+        console.log('[GeminiLive] Requesting microphone...')
+        let permissionStream: MediaStream | null = null
+        try {
+            permissionStream = await navigator.mediaDevices.getUserMedia({ audio: true })
+            console.log('[GeminiLive] Microphone granted')
+        } catch (error) {
+            console.error('[GeminiLive] Microphone denied:', error)
+            state.statusCallback?.('error', 'Microphone permission denied')
+            throw error
+        } finally {
+            permissionStream?.getTracks().forEach((t) => t.stop())
+        }
+
+        // Connect WebSocket — use proxy URL if provided (avoids region restrictions)
+        const wsBase = state.wsBaseUrl || DEFAULT_GEMINI_LIVE_WS_BASE
+        const isProxy = !!state.wsBaseUrl
+        const authToken = this.api.getAuthToken() || ''
+        const wsUrl = isProxy
+            ? `${wsBase}${wsBase.includes('?') ? '&' : '?'}token=${encodeURIComponent(authToken)}`
+            : `${wsBase}?key=${encodeURIComponent(state.apiKey)}`
+        console.log('[GeminiLive] Connecting WebSocket to:', wsBase, isProxy ? '(proxied)' : '(direct)')
+        const ws = new WebSocket(wsUrl)
+        state.ws = ws
+
+        return new Promise<void>((resolve, reject) => {
+            let setupDone = false
+
+            ws.onopen = () => {
+                if (DEBUG) console.log('[GeminiLive] WebSocket connected, sending setup')
+
+                const liveConfig = buildGeminiLiveConfig()
+                const setupMessage = {
+                    setup: {
+                        model: `models/${liveConfig.model}`,
+                        generationConfig: {
+                            responseModalities: ['AUDIO'],
+                            speechConfig: {
+                                voiceConfig: {
+                                    prebuiltVoiceConfig: { voiceName: 'Aoede' }
+                                }
+                            }
+                        },
+                        systemInstruction: {
+                            parts: [{ text: liveConfig.systemInstruction }]
+                        },
+                        tools: liveConfig.tools.map((t) => ({
+                            functionDeclarations: t.functionDeclarations.map((fd) => ({
+                                name: fd.name,
+                                description: fd.description,
+                                parameters: fd.parameters
+                            }))
+                        }))
+                    }
+                }
+
+                ws.send(JSON.stringify(setupMessage))
+            }
+
+            ws.onmessage = async (event) => {
+                let data: Record<string, unknown>
+                try {
+                    if (event.data instanceof Blob) {
+                        const text = await event.data.text()
+                        data = JSON.parse(text) as Record<string, unknown>
+                    } else {
+                        data = JSON.parse(event.data as string) as Record<string, unknown>
+                    }
+                } catch {
+                    if (DEBUG) console.warn('[GeminiLive] Failed to parse message')
+                    return
+                }
+
+                // Log all message types for debugging
+                const msgKeys = Object.keys(data).filter(k => k !== 'serverContent' || !('modelTurn' in (data.serverContent as Record<string, unknown> || {})))
+                if (!data.serverContent) {
+                    console.log('[GeminiLive] Message:', msgKeys.join(', '), JSON.stringify(data).slice(0, 200))
+                }
+
+                // Setup complete
+                if (data.setupComplete && !setupDone) {
+                    setupDone = true
+                    if (DEBUG) console.log('[GeminiLive] Setup complete')
+                    state.statusCallback?.('connected')
+
+                    // Start audio capture
+                    startAudioCapture(state.playbackContext!)
+
+                    // Send initial context if available (no clientContent greeting — it breaks tool calls)
+                    if (config.initialContext) {
+                        sendClientContent(`[Context] ${config.initialContext}`)
+                    }
+
+                    resolve()
+                    return
+                }
+
+                // Server content (audio / text / turn complete)
+                const serverContent = data.serverContent as {
+                    modelTurn?: { parts?: Array<{ inlineData?: { data: string; mimeType: string }; text?: string }> }
+                    turnComplete?: boolean
+                } | undefined
+
+                if (serverContent) {
+                    if (serverContent.modelTurn?.parts) {
+                        // Model is generating — mute mic to prevent barge-in from noise
+                        if (!state.modelSpeaking) {
+                            state.modelSpeaking = true
+                            state.recorder?.setMuted(true)
+                        }
+                        for (const part of serverContent.modelTurn.parts) {
+                            if (part.inlineData?.data) {
+                                state.player?.enqueue(part.inlineData.data)
+                            }
+                            if (part.text) {
+                                console.log('[GeminiLive] Text:', part.text)
+                            }
+                        }
+                    }
+                    if (serverContent.turnComplete) {
+                        console.log('[GeminiLive] Turn complete')
+                        // Model done — unmute mic for next user turn
+                        state.modelSpeaking = false
+                        state.recorder?.setMuted(false)
+                    }
+                }
+
+                // Tool calls
+                const toolCall = data.toolCall as {
+                    functionCalls?: Array<{ name: string; args: Record<string, unknown>; id: string }>
+                } | undefined
+
+                if (toolCall?.functionCalls && toolCall.functionCalls.length > 0) {
+                    console.log('[GeminiLive] Tool calls:', toolCall.functionCalls.map((c) => c.name))
+
+                    const responses = await handleGeminiFunctionCalls(
+                        toolCall.functionCalls as GeminiFunctionCall[]
+                    )
+
+                    // Send tool responses back
+                    if (state.ws?.readyState === WebSocket.OPEN) {
+                        state.ws.send(JSON.stringify({
+                            toolResponse: {
+                                functionResponses: responses.map((r) => ({
+                                    id: r.id,
+                                    name: r.name,
+                                    response: r.response
+                                }))
+                            }
+                        }))
+                    }
+                }
+            }
+
+            ws.onerror = (event) => {
+                console.error('[GeminiLive] WebSocket error:', event)
+                if (!setupDone) {
+                    state.statusCallback?.('error', 'WebSocket connection failed')
+                    reject(new Error('WebSocket connection failed'))
+                }
+            }
+
+            ws.onclose = (event) => {
+                if (DEBUG) console.log('[GeminiLive] WebSocket closed:', event.code, event.reason)
+                cleanup()
+                resetRealtimeSessionState()
+                if (!setupDone) {
+                    const message = event.reason || 'WebSocket closed before setup completed'
+                    state.statusCallback?.('error', message)
+                    reject(new Error(message))
+                    return
+                }
+                state.statusCallback?.('disconnected')
+            }
+        })
+    }
+
+    async endSession(): Promise<void> {
+        cleanup()
+        resetRealtimeSessionState()
+        state.statusCallback?.('disconnected')
+    }
+
+    sendTextMessage(message: string): void {
+        sendClientContent(message)
+    }
+
+    sendContextualUpdate(update: string): void {
+        // Send as a system-like context message
+        sendClientContent(`[System Context Update] ${update}`)
+    }
+}
+
+function sendClientContent(text: string): void {
+    if (!state.ws || state.ws.readyState !== WebSocket.OPEN) return
+    state.ws.send(JSON.stringify({
+        clientContent: {
+            turns: [{ role: 'user', parts: [{ text }] }],
+            turnComplete: true
+        }
+    }))
+}
+
+function sendAudioChunk(base64Pcm: string): void {
+    if (!state.ws || state.ws.readyState !== WebSocket.OPEN) return
+    // Don't send audio while model is speaking
+    if (state.modelSpeaking) return
+    state.ws.send(JSON.stringify({
+        realtimeInput: {
+            mediaChunks: [{
+                mimeType: 'audio/pcm;rate=16000',
+                data: base64Pcm
+            }]
+        }
+    }))
+}
+
+function startAudioCapture(playbackContext: AudioContext): void {
+    state.player = new GeminiAudioPlayer(playbackContext)
+    state.recorder = new GeminiAudioRecorder()
+
+    state.recorder.start(
+        (pcm16Chunk) => sendAudioChunk(pcm16Chunk),
+        (error) => {
+            console.error('[GeminiLive] Audio capture error:', error)
+            state.statusCallback?.('error', 'Microphone error')
+        }
+    )
+
+    // Apply initial mute state — the React effect may have run before the recorder existed
+    if (state.micMuted) {
+        state.recorder.setMuted(true)
+    }
+}
+
+// --- React component ---
+
+export interface GeminiLiveVoiceSessionProps {
+    api: ApiClient
+    micMuted?: boolean
+    onStatusChange?: StatusCallback
+    onRegistered?: () => void
+    getSession?: (sessionId: string) => Session | null
+    sendMessage?: (sessionId: string, message: string) => void
+    approvePermission?: (sessionId: string, requestId: string) => Promise<void>
+    denyPermission?: (sessionId: string, requestId: string) => Promise<void>
+}
+
+export function GeminiLiveVoiceSession({
+    api,
+    micMuted = false,
+    onStatusChange,
+    onRegistered,
+    getSession,
+    sendMessage,
+    approvePermission,
+    denyPermission
+}: GeminiLiveVoiceSessionProps) {
+    const hasRegistered = useRef(false)
+
+    // Store status callback
+    useEffect(() => {
+        state.statusCallback = onStatusChange || null
+        return () => { state.statusCallback = null }
+    }, [onStatusChange])
+
+    // Register session store for client tools
+    useEffect(() => {
+        if (getSession && sendMessage && approvePermission && denyPermission) {
+            registerSessionStore({
+                getSession: (sessionId: string) =>
+                    getSession(sessionId) as { agentState?: { requests?: Record<string, unknown> } } | null,
+                sendMessage,
+                approvePermission,
+                denyPermission
+            })
+        }
+    }, [getSession, sendMessage, approvePermission, denyPermission])
+
+    // Register voice session once
+    useEffect(() => {
+        if (!hasRegistered.current) {
+            try {
+                registerVoiceSession(new GeminiLiveVoiceSessionImpl(api))
+                hasRegistered.current = true
+                onRegistered?.()
+            } catch (error) {
+                console.error('[GeminiLive] Failed to register voice session:', error)
+            }
+        }
+    }, [api]) // eslint-disable-line react-hooks/exhaustive-deps
+
+    // Sync mic mute state — also persist to module state so startAudioCapture can apply it
+    useEffect(() => {
+        state.micMuted = micMuted
+        if (state.recorder) {
+            state.recorder.setMuted(micMuted)
+        }
+    }, [micMuted])
+
+    // Handle barge-in: clear audio queue when user starts speaking
+    const handleBargeIn = useCallback(() => {
+        if (state.player?.isPlaying()) {
+            state.player.clearQueue()
+        }
+    }, [])
+
+    // Cleanup on unmount
+    useEffect(() => {
+        return () => {
+            cleanup()
+        }
+    }, [])
+
+    return null
+}
diff --git a/web/src/realtime/QwenVoiceSession.tsx b/web/src/realtime/QwenVoiceSession.tsx
new file mode 100644
index 0000000000..f0624cc2da
--- /dev/null
+++ b/web/src/realtime/QwenVoiceSession.tsx
@@ -0,0 +1,417 @@
+import { useEffect, useRef, useCallback } from 'react'
+import { registerVoiceSession, resetRealtimeSessionState } from './RealtimeSession'
+import { registerSessionStore } from './realtimeClientTools'
+import { fetchQwenToken } from '@/api/voice'
+import { GeminiAudioRecorder } from './gemini/audioRecorder'
+import { GeminiAudioPlayer } from './gemini/audioPlayer'
+import { realtimeClientTools } from './realtimeClientTools'
+import {
+    QWEN_REALTIME_MODEL,
+    QWEN_REALTIME_VOICE,
+    VOICE_SYSTEM_PROMPT,
+    VOICE_CHINESE_LANGUAGE_BLOCK,
+    VOICE_TOOL_DEFINITIONS
+} from '@hapi/protocol/voice'
+import type { VoiceSession, VoiceSessionConfig, StatusCallback } from './types'
+import type { ApiClient } from '@/api/client'
+import type { Session } from '@/types/api'
+
+const DEBUG = import.meta.env.DEV
+
+// Qwen WebSocket connects via Hub proxy (browser can't set Authorization header)
+
+interface QwenState {
+    ws: WebSocket | null
+    recorder: GeminiAudioRecorder | null
+    player: GeminiAudioPlayer | null
+    playbackContext: AudioContext | null
+    statusCallback: StatusCallback | null
+    apiKey: string | null
+    wsBaseUrl: string | null
+    micMuted: boolean
+}
+
+const state: QwenState = {
+    ws: null,
+    recorder: null,
+    player: null,
+    playbackContext: null,
+    statusCallback: null,
+    apiKey: null,
+    wsBaseUrl: null,
+    micMuted: false
+}
+
+let eventCounter = 0
+function nextEventId(): string {
+    return `evt_${++eventCounter}`
+}
+
+function cleanup() {
+    if (state.recorder) {
+        state.recorder.dispose()
+        state.recorder = null
+    }
+    if (state.player) {
+        state.player.dispose()
+        state.player = null
+    }
+    if (state.playbackContext && state.playbackContext.state !== 'closed') {
+        void state.playbackContext.close()
+    }
+    state.playbackContext = null
+    if (state.ws) {
+        if (state.ws.readyState === WebSocket.OPEN || state.ws.readyState === WebSocket.CONNECTING) {
+            state.ws.close()
+        }
+        state.ws = null
+    }
+}
+
+function sendEvent(type: string, payload?: Record<string, unknown>): void {
+    if (!state.ws || state.ws.readyState !== WebSocket.OPEN) return
+    state.ws.send(JSON.stringify({
+        event_id: nextEventId(),
+        type,
+        ...payload
+    }))
+}
+
+class QwenVoiceSessionImpl implements VoiceSession {
+    private api: ApiClient
+
+    constructor(api: ApiClient) {
+        this.api = api
+    }
+
+    async startSession(config: VoiceSessionConfig): Promise<void> {
+        cleanup()
+        state.statusCallback?.('connecting')
+
+        // Create playback AudioContext immediately while still inside the user
+        // gesture (click/tap). Mobile browsers require this for autoplay policy.
+        // Store in state so cleanup() can close it on failure or stop.
+        state.playbackContext = new AudioContext({ sampleRate: 24000 })
+        await state.playbackContext.resume()
+
+        // Check Qwen availability (hub no longer sends the raw API key)
+        const tokenResp = await fetchQwenToken(this.api)
+        if (!tokenResp.allowed) {
+            const msg = tokenResp.error ?? 'DashScope API key not available'
+            state.statusCallback?.('error', msg)
+            throw new Error(msg)
+        }
+        state.apiKey = null // key stays server-side
+        state.wsBaseUrl = tokenResp.wsUrl || null
+
+        // Request microphone
+        let permissionStream: MediaStream | null = null
+        try {
+            permissionStream = await navigator.mediaDevices.getUserMedia({ audio: true })
+        } catch (error) {
+            state.statusCallback?.('error', 'Microphone permission denied')
+            throw error
+        } finally {
+            permissionStream?.getTracks().forEach((t) => t.stop())
+        }
+
+        // Connect via Hub WebSocket proxy (DashScope requires Authorization header,
+        // which browser WebSocket API doesn't support)
+        const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'
+        const defaultProxyUrl = `${protocol}//${window.location.host}/api/voice/qwen-ws`
+        const proxyUrl = state.wsBaseUrl || defaultProxyUrl
+        const model = QWEN_REALTIME_MODEL
+        const authToken = this.api.getAuthToken() || ''
+        const separator = proxyUrl.includes('?') ? '&' : '?'
+        const wsUrl = `${proxyUrl}${separator}model=${encodeURIComponent(model)}&token=${encodeURIComponent(authToken)}`
+        const ws = new WebSocket(wsUrl)
+        state.ws = ws
+
+        return new Promise<void>((resolve, reject) => {
+            let sessionReady = false
+
+            ws.onopen = () => {
+                if (DEBUG) console.log('[Qwen] WebSocket connected')
+            }
+
+            ws.onmessage = async (event) => {
+                let data: Record<string, unknown>
+                try {
+                    data = JSON.parse(event.data as string) as Record<string, unknown>
+                } catch {
+                    if (DEBUG) console.warn('[Qwen] Failed to parse message')
+                    return
+                }
+
+                const eventType = data.type as string
+
+                // Session created - send configuration
+                if (eventType === 'session.created' && !sessionReady) {
+                    if (DEBUG) console.log('[Qwen] Session created')
+
+                    // Build tools config
+                    const tools = VOICE_TOOL_DEFINITIONS.map((td) => ({
+                        type: 'function' as const,
+                        name: td.name,
+                        description: td.description,
+                        parameters: td.parameters
+                    }))
+
+                    // Send session.update with full configuration
+                    const basePrompt = VOICE_SYSTEM_PROMPT + VOICE_CHINESE_LANGUAGE_BLOCK
+                    const instructions = config.initialContext
+                        ? `${basePrompt}\n\n[Current Context]\n${config.initialContext}`
+                        : basePrompt
+
+                    sendEvent('session.update', {
+                        session: {
+                            modalities: ['text', 'audio'],
+                            voice: QWEN_REALTIME_VOICE,
+                            input_audio_format: 'pcm',
+                            output_audio_format: 'pcm',
+                            instructions,
+                            temperature: 0.7,
+                            turn_detection: {
+                                type: 'server_vad',
+                                threshold: 0.5,
+                                silence_duration_ms: 800,
+                                prefix_padding_ms: 300
+                            },
+                            tools,
+                            tool_choice: 'auto'
+                        }
+                    })
+                    return
+                }
+
+                // Session updated - ready to go
+                if (eventType === 'session.updated') {
+                    sessionReady = true
+                    if (DEBUG) console.log('[Qwen] Session configured')
+                    state.statusCallback?.('connected')
+                    startAudioCapture(state.playbackContext!)
+                    resolve()
+                    return
+                }
+
+                // Audio output streaming
+                if (eventType === 'response.audio.delta') {
+                    const delta = data.delta as string
+                    if (delta) {
+                        state.player?.enqueue(delta)
+                    }
+                    return
+                }
+
+                // Text transcript (for debug)
+                if (eventType === 'response.audio_transcript.delta' && DEBUG) {
+                    console.log('[Qwen] Transcript:', data.delta)
+                    return
+                }
+
+                // Function call complete
+                if (eventType === 'response.function_call_arguments.done') {
+                    const callId = data.call_id as string
+                    const fnName = data.name as string
+                    const argsStr = data.arguments as string
+
+                    if (DEBUG) console.log('[Qwen] Tool call:', fnName, argsStr)
+
+                    let args: Record<string, unknown> = {}
+                    try { args = JSON.parse(argsStr) } catch { /* empty */ }
+
+                    // Execute the tool
+                    const handler = fnName === 'messageCodingAgent'
+                        ? realtimeClientTools.messageCodingAgent
+                        : fnName === 'processPermissionRequest'
+                        ? realtimeClientTools.processPermissionRequest
+                        : null
+
+                    const result = handler
+                        ? await handler(args)
+                        : `error (unknown tool: ${fnName})`
+
+                    // Send function result back
+                    sendEvent('conversation.item.create', {
+                        item: {
+                            type: 'function_call_output',
+                            call_id: callId,
+                            output: typeof result === 'string' ? result : JSON.stringify(result)
+                        }
+                    })
+                    // Trigger model to continue
+                    sendEvent('response.create')
+                    return
+                }
+
+                // VAD: user started speaking - barge-in
+                if (eventType === 'input_audio_buffer.speech_started') {
+                    if (state.player?.isPlaying()) {
+                        state.player.clearQueue()
+                    }
+                    return
+                }
+
+                // Response done
+                if (eventType === 'response.done' && DEBUG) {
+                    const resp = data.response as Record<string, unknown> | undefined
+                    const usage = resp?.usage as Record<string, unknown> | undefined
+                    if (usage) console.log('[Qwen] Usage:', usage)
+                    return
+                }
+
+                // Error
+                if (eventType === 'error') {
+                    const err = data.error as { message?: string } | undefined
+                    const message = err?.message || 'Realtime session setup failed'
+                    console.error('[Qwen] Server error:', message)
+                    state.statusCallback?.('error', message)
+                    if (!sessionReady) {
+                        reject(new Error(message))
+                        ws.close()
+                    }
+                    return
+                }
+            }
+
+            ws.onerror = (event) => {
+                console.error('[Qwen] WebSocket error:', event)
+                if (!sessionReady) {
+                    state.statusCallback?.('error', 'WebSocket connection failed')
+                    reject(new Error('WebSocket connection failed'))
+                }
+            }
+
+            ws.onclose = (event) => {
+                if (DEBUG) console.log('[Qwen] WebSocket closed:', event.code, event.reason)
+                cleanup()
+                resetRealtimeSessionState()
+                if (!sessionReady) {
+                    const message = event.reason || 'WebSocket closed before setup completed'
+                    state.statusCallback?.('error', message)
+                    reject(new Error(message))
+                    return
+                }
+                state.statusCallback?.('disconnected')
+            }
+        })
+    }
+
+    async endSession(): Promise<void> {
+        cleanup()
+        resetRealtimeSessionState()
+        state.statusCallback?.('disconnected')
+    }
+
+    sendTextMessage(message: string): void {
+        // Send text as a user message via conversation.item.create
+        sendEvent('conversation.item.create', {
+            item: {
+                type: 'message',
+                role: 'user',
+                content: [{ type: 'input_text', text: message }]
+            }
+        })
+        sendEvent('response.create')
+    }
+
+    sendContextualUpdate(update: string): void {
+        // Send context as a system-like user message
+        sendEvent('conversation.item.create', {
+            item: {
+                type: 'message',
+                role: 'user',
+                content: [{ type: 'input_text', text: `[System Context Update] ${update}` }]
+            }
+        })
+    }
+}
+
+function startAudioCapture(playbackContext: AudioContext): void {
+    state.player = new GeminiAudioPlayer(playbackContext)
+    state.recorder = new GeminiAudioRecorder()
+
+    state.recorder.start(
+        (base64Pcm) => {
+            sendEvent('input_audio_buffer.append', { audio: base64Pcm })
+        },
+        (error) => {
+            console.error('[Qwen] Audio capture error:', error)
+            state.statusCallback?.('error', 'Microphone error')
+        }
+    )
+
+    // Apply initial mute state — the React effect may have run before the recorder existed
+    if (state.micMuted) {
+        state.recorder.setMuted(true)
+    }
+}
+
+// --- React component ---
+
+export interface QwenVoiceSessionProps {
+    api: ApiClient
+    micMuted?: boolean
+    onStatusChange?: StatusCallback
+    onRegistered?: () => void
+    getSession?: (sessionId: string) => Session | null
+    sendMessage?: (sessionId: string, message: string) => void
+    approvePermission?: (sessionId: string, requestId: string) => Promise<void>
+    denyPermission?: (sessionId: string, requestId: string) => Promise<void>
+}
+
+export function QwenVoiceSession({
+    api,
+    micMuted = false,
+    onStatusChange,
+    onRegistered,
+    getSession,
+    sendMessage,
+    approvePermission,
+    denyPermission
+}: QwenVoiceSessionProps) {
+    const hasRegistered = useRef(false)
+
+    useEffect(() => {
+        state.statusCallback = onStatusChange || null
+        return () => { state.statusCallback = null }
+    }, [onStatusChange])
+
+    useEffect(() => {
+        if (getSession && sendMessage && approvePermission && denyPermission) {
+            registerSessionStore({
+                getSession: (sessionId: string) =>
+                    getSession(sessionId) as { agentState?: { requests?: Record<string, unknown> } } | null,
+                sendMessage,
+                approvePermission,
+                denyPermission
+            })
+        }
+    }, [getSession, sendMessage, approvePermission, denyPermission])
+
+    useEffect(() => {
+        if (!hasRegistered.current) {
+            try {
+                registerVoiceSession(new QwenVoiceSessionImpl(api))
+                hasRegistered.current = true
+                onRegistered?.()
+            } catch (error) {
+                console.error('[Qwen] Failed to register voice session:', error)
+            }
+        }
+    }, [api]) // eslint-disable-line react-hooks/exhaustive-deps
+
+    // Sync mic mute state — also persist to module state so startAudioCapture can apply it
+    useEffect(() => {
+        state.micMuted = micMuted
+        if (state.recorder) {
+            state.recorder.setMuted(micMuted)
+        }
+    }, [micMuted])
+
+    useEffect(() => {
+        return () => { cleanup() }
+    }, [])
+
+    return null
+}
diff --git a/web/src/realtime/RealtimeVoiceSession.tsx b/web/src/realtime/RealtimeVoiceSession.tsx
index fff9b7b44b..7bfac5953a 100644
--- a/web/src/realtime/RealtimeVoiceSession.tsx
+++ b/web/src/realtime/RealtimeVoiceSession.tsx
@@ -126,6 +126,7 @@ export interface RealtimeVoiceSessionProps {
     api: ApiClient
     micMuted?: boolean
     onStatusChange?: StatusCallback
+    onRegistered?: () => void
     getSession?: (sessionId: string) => Session | null
     sendMessage?: (sessionId: string, message: string) => void
     approvePermission?: (sessionId: string, requestId: string) => Promise<void>
@@ -136,6 +137,7 @@ export function RealtimeVoiceSession({
     api,
     micMuted: micMutedProp = false,
     onStatusChange,
+    onRegistered,
     getSession,
     sendMessage,
     approvePermission,
@@ -231,6 +233,7 @@ export function RealtimeVoiceSession({
             try {
                 registerVoiceSession(new RealtimeVoiceSessionImpl(api))
                 hasRegistered.current = true
+                onRegistered?.()
             } catch (error) {
                 console.error('[Voice] Failed to register voice session:', error)
             }
diff --git a/web/src/realtime/VoiceBackendSession.tsx b/web/src/realtime/VoiceBackendSession.tsx
new file mode 100644
index 0000000000..b990d07bd7
--- /dev/null
+++ b/web/src/realtime/VoiceBackendSession.tsx
@@ -0,0 +1,63 @@
+import { lazy, Suspense, useCallback, useEffect, useState } from 'react'
+import { RealtimeVoiceSession } from './RealtimeVoiceSession'
+import type { RealtimeVoiceSessionProps } from './RealtimeVoiceSession'
+import { fetchVoiceBackend } from '@/api/voice'
+import type { ApiClient } from '@/api/client'
+import type { VoiceBackendType } from '@hapi/protocol/voice'
+
+// Lazy-load alternative backends to avoid bundling when using ElevenLabs
+const GeminiLiveVoiceSession = lazy(() =>
+    import('./GeminiLiveVoiceSession').then((m) => ({ default: m.GeminiLiveVoiceSession }))
+)
+const QwenVoiceSession = lazy(() =>
+    import('./QwenVoiceSession').then((m) => ({ default: m.QwenVoiceSession }))
+)
+
+export type VoiceBackendSessionProps = RealtimeVoiceSessionProps & {
+    api: ApiClient
+    onReadyChange?: (ready: boolean) => void
+}
+
+/**
+ * Dynamically selects the voice session component based on the hub's configured backend.
+ * Queries GET /voice/backend once on mount and renders the appropriate component.
+ * Only signals readiness after the selected backend has mounted and registered its session.
+ */
+export function VoiceBackendSession(props: VoiceBackendSessionProps) {
+    const [backend, setBackend] = useState<VoiceBackendType | null>(null)
+
+    useEffect(() => {
+        let cancelled = false
+        fetchVoiceBackend(props.api).then((resp) => {
+            if (!cancelled) setBackend(resp.backend)
+        })
+        return () => {
+            cancelled = true
+            props.onReadyChange?.(false)
+        }
+    }, [props.api]) // eslint-disable-line react-hooks/exhaustive-deps
+
+    const handleRegistered = useCallback(() => {
+        props.onReadyChange?.(true)
+    }, [props.onReadyChange])
+
+    if (!backend) return null
+
+    if (backend === 'gemini-live') {
+        return (
+            <Suspense fallback={null}>
+                <GeminiLiveVoiceSession {...props} onRegistered={handleRegistered} />
+            </Suspense>
+        )
+    }
+
+    if (backend === 'qwen-realtime') {
+        return (
+            <Suspense fallback={null}>
+                <QwenVoiceSession {...props} onRegistered={handleRegistered} />
+            </Suspense>
+        )
+    }
+
+    return <RealtimeVoiceSession {...props} onRegistered={handleRegistered} />
+}
diff --git a/web/src/realtime/gemini/audioPlayer.ts b/web/src/realtime/gemini/audioPlayer.ts
new file mode 100644
index 0000000000..23d1d341e4
--- /dev/null
+++ b/web/src/realtime/gemini/audioPlayer.ts
@@ -0,0 +1,75 @@
+import { base64ToArrayBuffer, pcm16ToFloat32 } from './pcmUtils';
+
+export class GeminiAudioPlayer {
+  private audioContext: AudioContext;
+  private ownsContext: boolean;
+  private lastEndTime: number = 0;
+  private activeSources: AudioBufferSourceNode[] = [];
+
+  constructor(audioContext?: AudioContext) {
+    if (audioContext) {
+      this.audioContext = audioContext;
+      this.ownsContext = false;
+    } else {
+      this.audioContext = new AudioContext({ sampleRate: 24000 });
+      this.ownsContext = true;
+    }
+    this.lastEndTime = this.audioContext.currentTime;
+  }
+
+  enqueue(base64Pcm: string): void {
+    if (this.audioContext.state === 'suspended') {
+      this.audioContext.resume();
+    }
+
+    const arrayBuffer = base64ToArrayBuffer(base64Pcm);
+    const float32Data = pcm16ToFloat32(arrayBuffer);
+    
+    if (float32Data.length === 0) return;
+
+    const audioBuffer = this.audioContext.createBuffer(1, float32Data.length, 24000);
+    audioBuffer.copyToChannel(new Float32Array(float32Data), 0);
+
+    const source = this.audioContext.createBufferSource();
+    source.buffer = audioBuffer;
+    source.connect(this.audioContext.destination);
+
+    const startTime = Math.max(this.audioContext.currentTime, this.lastEndTime);
+    
+    source.onended = () => {
+      const index = this.activeSources.indexOf(source);
+      if (index > -1) {
+        this.activeSources.splice(index, 1);
+      }
+    };
+
+    source.start(startTime);
+    this.activeSources.push(source);
+
+    this.lastEndTime = startTime + audioBuffer.duration;
+  }
+
+  clearQueue(): void {
+    this.activeSources.forEach(source => {
+      try {
+        source.stop();
+      } catch (e) {
+        // Ignore if already stopped
+      }
+      source.disconnect();
+    });
+    this.activeSources = [];
+    this.lastEndTime = this.audioContext.currentTime;
+  }
+
+  isPlaying(): boolean {
+    return this.lastEndTime > this.audioContext.currentTime;
+  }
+
+  dispose(): void {
+    this.clearQueue();
+    if (this.ownsContext && this.audioContext.state !== 'closed') {
+      this.audioContext.close();
+    }
+  }
+}
diff --git a/web/src/realtime/gemini/audioRecorder.ts b/web/src/realtime/gemini/audioRecorder.ts
new file mode 100644
index 0000000000..98813212a0
--- /dev/null
+++ b/web/src/realtime/gemini/audioRecorder.ts
@@ -0,0 +1,139 @@
+import { float32ToPcm16, arrayBufferToBase64 } from './pcmUtils';
+
+// Inline worklet source to avoid Vite bundling issues with ?url imports.
+// AudioWorklet.addModule() requires a URL to valid JS, so we create a Blob URL.
+const WORKLET_SOURCE = `
+class PcmRecorderProcessor extends AudioWorkletProcessor {
+  constructor() {
+    super();
+    this.buffer = new Float32Array(4096);
+    this.idx = 0;
+  }
+  process(inputs) {
+    const input = inputs[0];
+    if (input && input.length > 0) {
+      const channel = input[0];
+      for (let i = 0; i < channel.length; i++) {
+        this.buffer[this.idx++] = channel[i];
+        if (this.idx >= 4096) {
+          this.port.postMessage({ samples: this.buffer.slice() });
+          this.idx = 0;
+        }
+      }
+    }
+    return true;
+  }
+}
+registerProcessor('pcm-recorder-processor', PcmRecorderProcessor);
+`;
+
+function createWorkletUrl(): string {
+  const blob = new Blob([WORKLET_SOURCE], { type: 'application/javascript' });
+  return URL.createObjectURL(blob);
+}
+
+export class GeminiAudioRecorder {
+  private audioContext: AudioContext | null = null;
+  private mediaStream: MediaStream | null = null;
+  private sourceNode: MediaStreamAudioSourceNode | null = null;
+  private workletNode: AudioWorkletNode | null = null;
+  private scriptNode: ScriptProcessorNode | null = null;
+
+  async start(onChunk: (base64Pcm: string) => void, onError?: (error: Error) => void): Promise<void> {
+    try {
+      this.mediaStream = await navigator.mediaDevices.getUserMedia({
+        audio: { sampleRate: 16000, channelCount: 1 }
+      });
+
+      this.mediaStream.getTracks().forEach((track) => {
+        track.onended = () => {
+          if (onError) onError(new Error('Microphone disconnected'));
+        };
+      });
+
+      this.audioContext = new AudioContext({ sampleRate: 16000 });
+      if (this.audioContext.state === 'suspended') {
+        await this.audioContext.resume();
+      }
+
+      this.sourceNode = this.audioContext.createMediaStreamSource(this.mediaStream);
+
+      try {
+        const workletUrl = createWorkletUrl();
+        await this.audioContext.audioWorklet.addModule(workletUrl);
+        URL.revokeObjectURL(workletUrl);
+
+        this.workletNode = new AudioWorkletNode(this.audioContext, 'pcm-recorder-processor');
+        this.workletNode.port.onmessage = (event) => {
+          const pcm16 = float32ToPcm16(event.data.samples);
+          const base64 = arrayBufferToBase64(pcm16);
+          onChunk(base64);
+        };
+        // Connect source → worklet → silent sink → destination.
+        // The downstream connection is required so the audio graph pulls
+        // frames through the worklet node and port.onmessage fires.
+        const sink = this.audioContext.createGain();
+        sink.gain.value = 0;
+        this.sourceNode.connect(this.workletNode);
+        this.workletNode.connect(sink);
+        sink.connect(this.audioContext.destination);
+      } catch (e) {
+        console.warn('[GeminiLive] AudioWorklet failed, falling back to ScriptProcessorNode', e);
+        this.scriptNode = this.audioContext.createScriptProcessor(4096, 1, 1);
+        this.scriptNode.onaudioprocess = (event) => {
+          const inputData = event.inputBuffer.getChannelData(0);
+          const pcm16 = float32ToPcm16(new Float32Array(inputData));
+          const base64 = arrayBufferToBase64(pcm16);
+          onChunk(base64);
+        };
+        this.sourceNode.connect(this.scriptNode);
+        this.scriptNode.connect(this.audioContext.destination);
+      }
+    } catch (e) {
+      if (onError) onError(e instanceof Error ? e : new Error(String(e)));
+      throw e;
+    }
+  }
+
+  stop(): void {
+    if (this.mediaStream) {
+      this.mediaStream.getTracks().forEach(track => {
+        track.onended = null;
+        track.stop();
+      });
+      this.mediaStream = null;
+    }
+
+    if (this.scriptNode) {
+      this.scriptNode.disconnect();
+      this.scriptNode = null;
+    }
+
+    if (this.workletNode) {
+      this.workletNode.disconnect();
+      this.workletNode = null;
+    }
+
+    if (this.sourceNode) {
+      this.sourceNode.disconnect();
+      this.sourceNode = null;
+    }
+
+    if (this.audioContext) {
+      this.audioContext.close();
+      this.audioContext = null;
+    }
+  }
+
+  setMuted(muted: boolean): void {
+    if (this.mediaStream) {
+      this.mediaStream.getAudioTracks().forEach(track => {
+        track.enabled = !muted;
+      });
+    }
+  }
+
+  dispose(): void {
+    this.stop();
+  }
+}
diff --git a/web/src/realtime/gemini/pcm-recorder.worklet.ts b/web/src/realtime/gemini/pcm-recorder.worklet.ts
new file mode 100644
index 0000000000..404f65445b
--- /dev/null
+++ b/web/src/realtime/gemini/pcm-recorder.worklet.ts
@@ -0,0 +1,35 @@
+// AudioWorklet processor runs in a separate scope with its own globals.
+// These declarations satisfy TypeScript without pulling in DOM lib types.
+declare class AudioWorkletProcessor {
+  readonly port: MessagePort
+  constructor()
+}
+declare function registerProcessor(name: string, ctor: new () => AudioWorkletProcessor): void
+
+class PcmRecorderProcessor extends AudioWorkletProcessor {
+  private buffer: Float32Array;
+  private bufferSize = 4096;
+  private bufferIndex = 0;
+
+  constructor() {
+    super();
+    this.buffer = new Float32Array(this.bufferSize);
+  }
+
+  process(inputs: Float32Array[][]): boolean {
+    const input = inputs[0];
+    if (input && input.length > 0) {
+      const channel = input[0];
+      for (let i = 0; i < channel.length; i++) {
+        this.buffer[this.bufferIndex++] = channel[i];
+        if (this.bufferIndex >= this.bufferSize) {
+          this.port.postMessage({ samples: this.buffer.slice() });
+          this.bufferIndex = 0;
+        }
+      }
+    }
+    return true;
+  }
+}
+
+registerProcessor('pcm-recorder-processor', PcmRecorderProcessor);
diff --git a/web/src/realtime/gemini/pcmUtils.test.ts b/web/src/realtime/gemini/pcmUtils.test.ts
new file mode 100644
index 0000000000..2e0be05c3f
--- /dev/null
+++ b/web/src/realtime/gemini/pcmUtils.test.ts
@@ -0,0 +1,60 @@
+import { describe, test, expect } from 'bun:test'
+import {
+    float32ToPcm16,
+    pcm16ToFloat32,
+    arrayBufferToBase64,
+    base64ToArrayBuffer
+} from './pcmUtils'
+
+describe('pcmUtils', () => {
+    describe('float32ToPcm16 / pcm16ToFloat32 round-trip', () => {
+        test('preserves signal within quantization error', () => {
+            const input = new Float32Array([0, 0.5, -0.5, 1.0, -1.0])
+            const pcm16 = float32ToPcm16(input)
+            const output = pcm16ToFloat32(pcm16)
+
+            expect(output.length).toBe(input.length)
+            for (let i = 0; i < input.length; i++) {
+                expect(Math.abs(output[i] - input[i])).toBeLessThan(0.001)
+            }
+        })
+
+        test('clamps values outside [-1, 1]', () => {
+            const input = new Float32Array([2.0, -2.0])
+            const pcm16 = float32ToPcm16(input)
+            const output = pcm16ToFloat32(pcm16)
+
+            expect(Math.abs(output[0] - 1.0)).toBeLessThan(0.001)
+            expect(Math.abs(output[1] - (-1.0))).toBeLessThan(0.001)
+        })
+
+        test('handles empty input', () => {
+            const input = new Float32Array(0)
+            const pcm16 = float32ToPcm16(input)
+            expect(pcm16.byteLength).toBe(0)
+            const output = pcm16ToFloat32(pcm16)
+            expect(output.length).toBe(0)
+        })
+    })
+
+    describe('arrayBufferToBase64 / base64ToArrayBuffer round-trip', () => {
+        test('preserves binary data', () => {
+            const original = new Uint8Array([0, 1, 127, 128, 255])
+            const base64 = arrayBufferToBase64(original.buffer)
+            const restored = new Uint8Array(base64ToArrayBuffer(base64))
+
+            expect(restored.length).toBe(original.length)
+            for (let i = 0; i < original.length; i++) {
+                expect(restored[i]).toBe(original[i])
+            }
+        })
+
+        test('handles empty buffer', () => {
+            const empty = new ArrayBuffer(0)
+            const base64 = arrayBufferToBase64(empty)
+            expect(base64).toBe('')
+            const restored = base64ToArrayBuffer(base64)
+            expect(restored.byteLength).toBe(0)
+        })
+    })
+})
diff --git a/web/src/realtime/gemini/pcmUtils.ts b/web/src/realtime/gemini/pcmUtils.ts
new file mode 100644
index 0000000000..67e2928fc0
--- /dev/null
+++ b/web/src/realtime/gemini/pcmUtils.ts
@@ -0,0 +1,39 @@
+export function float32ToPcm16(samples: Float32Array): ArrayBuffer {
+  const buffer = new ArrayBuffer(samples.length * 2);
+  const view = new DataView(buffer);
+  for (let i = 0; i < samples.length; i++) {
+    let s = Math.max(-1, Math.min(1, samples[i]));
+    s = s < 0 ? s * 0x8000 : s * 0x7FFF;
+    view.setInt16(i * 2, s, true);
+  }
+  return buffer;
+}
+
+export function pcm16ToFloat32(buffer: ArrayBuffer): Float32Array {
+  const int16Array = new Int16Array(buffer);
+  const float32Array = new Float32Array(int16Array.length);
+  for (let i = 0; i < int16Array.length; i++) {
+    const s = int16Array[i];
+    float32Array[i] = s < 0 ? s / 0x8000 : s / 0x7FFF;
+  }
+  return float32Array;
+}
+
+export function arrayBufferToBase64(buffer: ArrayBuffer): string {
+  let binary = '';
+  const bytes = new Uint8Array(buffer);
+  const len = bytes.byteLength;
+  for (let i = 0; i < len; i++) {
+    binary += String.fromCharCode(bytes[i]);
+  }
+  return btoa(binary);
+}
+
+export function base64ToArrayBuffer(base64: string): ArrayBuffer {
+  const binary = atob(base64);
+  const bytes = new Uint8Array(binary.length);
+  for (let i = 0; i < binary.length; i++) {
+    bytes[i] = binary.charCodeAt(i);
+  }
+  return bytes.buffer;
+}
diff --git a/web/src/realtime/gemini/toolAdapter.test.ts b/web/src/realtime/gemini/toolAdapter.test.ts
new file mode 100644
index 0000000000..5d98d6d4d0
--- /dev/null
+++ b/web/src/realtime/gemini/toolAdapter.test.ts
@@ -0,0 +1,28 @@
+import { describe, test, expect } from 'bun:test'
+import { handleGeminiFunctionCall, handleGeminiFunctionCalls } from './toolAdapter'
+import type { GeminiFunctionCall } from './toolAdapter'
+
+describe('toolAdapter', () => {
+    test('returns error for unknown tool', async () => {
+        const call: GeminiFunctionCall = {
+            name: 'unknownTool',
+            args: {},
+            id: 'call-1'
+        }
+        const resp = await handleGeminiFunctionCall(call)
+        expect(resp.name).toBe('unknownTool')
+        expect(resp.id).toBe('call-1')
+        expect(resp.response.result).toContain('unknown tool')
+    })
+
+    test('handles multiple calls in parallel', async () => {
+        const calls: GeminiFunctionCall[] = [
+            { name: 'unknownA', args: {}, id: 'a' },
+            { name: 'unknownB', args: {}, id: 'b' }
+        ]
+        const responses = await handleGeminiFunctionCalls(calls)
+        expect(responses.length).toBe(2)
+        expect(responses[0].id).toBe('a')
+        expect(responses[1].id).toBe('b')
+    })
+})
diff --git a/web/src/realtime/gemini/toolAdapter.ts b/web/src/realtime/gemini/toolAdapter.ts
new file mode 100644
index 0000000000..dbf4dee9c9
--- /dev/null
+++ b/web/src/realtime/gemini/toolAdapter.ts
@@ -0,0 +1,76 @@
+import { realtimeClientTools } from '../realtimeClientTools'
+
+/**
+ * Gemini Live API function call from server.
+ * Matches the `toolCall` shape in a BidiGenerateContent serverMessage.
+ */
+export interface GeminiFunctionCall {
+    name: string
+    args: Record<string, unknown>
+    id: string
+}
+
+/**
+ * Response sent back to Gemini Live via `toolResponse`.
+ */
+export interface GeminiFunctionResponse {
+    name: string
+    id: string
+    response: { result: string }
+}
+
+type ClientToolHandler = (parameters: unknown) => Promise<string>
+
+const toolHandlers: Record<string, ClientToolHandler> = {
+    messageCodingAgent: realtimeClientTools.messageCodingAgent,
+    processPermissionRequest: realtimeClientTools.processPermissionRequest
+}
+
+/**
+ * Execute a Gemini Live function call using the existing client tool handlers.
+ * Returns a GeminiFunctionResponse ready to send back over the WebSocket.
+ */
+export async function handleGeminiFunctionCall(
+    call: GeminiFunctionCall
+): Promise<GeminiFunctionResponse> {
+    const handler = toolHandlers[call.name]
+
+    if (!handler) {
+        return {
+            name: call.name,
+            id: call.id,
+            response: { result: `error (unknown tool: ${call.name})` }
+        }
+    }
+
+    try {
+        const result = await handler(call.args)
+        return {
+            name: call.name,
+            id: call.id,
+            response: { result }
+        }
+    } catch (error) {
+        const message = error instanceof Error ? error.message : 'unknown error'
+        return {
+            name: call.name,
+            id: call.id,
+            response: { result: `error (${message})` }
+        }
+    }
+}
+
+/**
+ * Process multiple function calls sequentially to avoid racing on shared
+ * session state (e.g. processPermissionRequest resolving the same pending
+ * request twice when calls run in parallel).
+ */
+export async function handleGeminiFunctionCalls(
+    calls: GeminiFunctionCall[]
+): Promise<GeminiFunctionResponse[]> {
+    const responses: GeminiFunctionResponse[] = []
+    for (const call of calls) {
+        responses.push(await handleGeminiFunctionCall(call))
+    }
+    return responses
+}
diff --git a/web/src/realtime/index.ts b/web/src/realtime/index.ts
index a7fa2fbe99..1e080123b5 100644
--- a/web/src/realtime/index.ts
+++ b/web/src/realtime/index.ts
@@ -15,8 +15,11 @@ export {
 // Client tools
 export { realtimeClientTools, registerSessionStore } from './realtimeClientTools'
 
-// Voice session component
+// Voice session components
 export { RealtimeVoiceSession, type RealtimeVoiceSessionProps } from './RealtimeVoiceSession'
+export { GeminiLiveVoiceSession, type GeminiLiveVoiceSessionProps } from './GeminiLiveVoiceSession'
+export { QwenVoiceSession, type QwenVoiceSessionProps } from './QwenVoiceSession'
+export { VoiceBackendSession, type VoiceBackendSessionProps } from './VoiceBackendSession'
 
 // Voice hooks
 export { voiceHooks, registerVoiceHooksStore } from './hooks/voiceHooks'
diff --git a/web/src/sw.ts b/web/src/sw.ts
index ebe55dc0a7..62be9dd29d 100644
--- a/web/src/sw.ts
+++ b/web/src/sw.ts
@@ -21,6 +21,9 @@ type PushPayload = {
     }
 }
 
+// Let the new service worker wait until all tabs close before activating.
+// Immediate skipWaiting + clientsClaim can break lazy-loaded chunks (e.g. voice)
+// when the old app shell requests hashes that the new precache no longer serves.
 precacheAndRoute(self.__WB_MANIFEST)
 
 registerRoute(
diff --git a/web/tsconfig.json b/web/tsconfig.json
index 8b0682a4bb..de7bcdca50 100644
--- a/web/tsconfig.json
+++ b/web/tsconfig.json
@@ -11,5 +11,6 @@
             "@/*": ["./src/*"]
         }
     },
-    "include": ["src"]
+    "include": ["src"],
+    "exclude": ["src/**/*.test.ts", "src/**/*.test.tsx", "src/**/*.spec.ts", "src/**/*.spec.tsx"]
 }