A fully voice-enabled interface for the Claude Desktop app. Speak to Claude and hear responses spoken aloud — no admin rights required.
Magyar verzió / Hungarian version: README_HU.md
spoken-mcp turns Claude Desktop into a voice assistant. You speak into your microphone, your speech is transcribed and sent to Claude, and Claude's response is read aloud to you — all automatically.
The system consists of two components that work alongside the Claude Desktop app:
- MCP TTS Server (
tts_server.py) — An MCP server that Claude calls via itsspeaktool. It sends Claude's response text to ElevenLabs Text-to-Speech and plays the audio on your speakers. - STT Companion (
stt_companion.py) — A background app with a system tray icon. It listens to your microphone, transcribes your speech via ElevenLabs Speech-to-Text (Scribe v1), and automatically pastes the transcription into Claude Desktop's input field.
This project prioritizes response quality and voice quality over speed. Claude's full text response is generated first, then converted to high-quality audio (MP3, 44.1 kHz). This means there is a noticeable delay between your question and hearing the spoken response — typically a few seconds depending on response length. This is a deliberate design choice: we chose natural-sounding, high-fidelity voice output over lower-latency but lower-quality alternatives.
┌─────────────────────────────────────────────────────┐
│ Claude Desktop App │
│ │
│ [User types or pastes from STT] → Claude responds │
│ ↓ │
│ tool call: speak │
│ ↓ │
│ ┌───────────────────┐ │
│ │ MCP TTS Server │ │
│ │ (ElevenLabs) │ │
│ └───────────────────┘ │
└─────────────────────────────────────────────────────┘
↑ auto-paste
┌─────────────────────────────────────────────────────┐
│ STT Companion (background process) │
│ │
│ Hotkey / VAD → record mic │
│ Speech detected → ElevenLabs STT → paste │
│ │
│ System tray: grey = idle, green = listening, │
│ red = recording │
└─────────────────────────────────────────────────────┘
The STT Companion supports three modes (set "mode" in config.json):
| Mode | Config value | How it works |
|---|---|---|
| Push-to-talk | "push_to_talk" |
Hold the hotkey to record, release to transcribe |
| VAD toggle | "vad" |
Press the hotkey once to start listening. VAD detects speech and silence, sends automatically, then stops listening |
| VAD always-on | "vad_always" |
Continuously listens for speech. Use a configurable mute key (e.g., F4) to toggle the microphone on/off |
| Color | Meaning |
|---|---|
| Grey | Idle — not listening |
| Green | Listening — waiting for speech (VAD active) |
| Red | Recording — speech detected, capturing audio |
Right-click the tray icon for a menu showing the current mode, key bindings, and an exit option.
Before you start, make sure you have:
- Windows 10 or 11
- Python 3.11 or newer — Download from python.org. During installation, check "Add Python to PATH".
- Claude Desktop app — Download from claude.ai/download
- An ElevenLabs account with API access — Sign up at elevenlabs.io. A paid plan is recommended — the free tier has limited characters per month, and the STT (Scribe) API may require a paid subscription. You'll need:
- Your API key (found in Profile → API Keys)
- A Voice ID for TTS (found in Voices → click a voice → Voice ID)
Option A — Using Git:
git clone https://github.com/leszini/spoken-mcp.git
cd spoken-mcpOption B — Manual download:
Download the ZIP from GitHub, extract it to a folder (e.g., C:\Users\YourName\Desktop\spoken-mcp).
Open a terminal (Command Prompt or PowerShell) in the project folder and run:
pip install -r requirements.txtNote: If
webrtcvadfails to install, try:pip install webrtcvad-wheels— this provides pre-built Windows binaries.
- In the project folder, find
config.example.json - Copy it and rename the copy to
config.json - Open
config.jsonin any text editor (Notepad works fine) - Fill in your settings:
{
"elevenlabs_api_key": "paste-your-api-key-here",
"tts": {
"voice_id": "paste-your-voice-id-here",
"model_id": "eleven_v3",
"language_code": "en"
},
"stt": {
"model_id": "scribe_v1",
"language_code": "en"
},
"audio": {
"sample_rate": 16000,
"channels": 1
},
"hotkey": "caps lock",
"mute_key": "f4",
"mode": "vad_always",
"vad": {
"aggressiveness": 2,
"silence_timeout": 3.0,
"speech_threshold": 3,
"volume_threshold": 500,
"min_duration": 0.5
},
"auto_paste": true,
"auto_enter": false
}Where to find your ElevenLabs credentials:
- API key: Log in to elevenlabs.io → click your profile icon (top-right) → Profile + API key → copy the key
- Voice ID: Go to Voices → pick a voice you like → click on it → copy the Voice ID from the URL or the details panel
Key settings to customize:
language_code— Change"en"to your language (e.g.,"hu"for Hungarian,"de"for German,"es"for Spanish). Set it in bothttsandsttsections.mode— Choose your preferred input mode (see Input Modes above)mute_key— The key to mute/unmute the microphone invad_alwaysmode. Supported keys:f1–f12,caps lock,scroll lock,pause,insert,num lockauto_enter— Set totrueif you want transcriptions to be sent automatically (hands-free). Set tofalseto review before sending.vad.volume_threshold— Minimum audio volume to be considered speech. Increase if background noise triggers false transcriptions. Set to0to disable (use the mute key instead).
Important:
config.jsoncontains your API key and is listed in.gitignore— it will never be uploaded to GitHub.
This tells Claude Desktop about the speak tool so Claude can use it.
- Open Claude Desktop
- Go to Settings (gear icon) → Developer → Edit Config
- This opens the file
claude_desktop_config.json. Add the following inside it:
{
"mcpServers": {
"spoken-mcp": {
"command": "python",
"args": ["C:\\full\\path\\to\\spoken-mcp\\tts_server.py"]
}
}
}Important: Replace C:\\full\\path\\to\\spoken-mcp\\tts_server.py with the actual path to the file on your computer. Use double backslashes (\\) in the path.
For example, if you put the project on your Desktop:
"args": ["C:\\Users\\YourName\\Desktop\\spoken-mcp\\tts_server.py"]Tip: If the
pythoncommand doesn't work, use the full Python path instead, e.g.,"command": "C:\\Python312\\python.exe". You can find yours by runningwhere pythonin a terminal.
- Save the file and restart Claude Desktop completely (close and reopen it).
In a terminal, run:
python stt_companion.pyA system tray icon will appear near your clock (you may need to click the ^ arrow to see it). The companion is now running in the background.
- Open Claude Desktop and start a conversation
- Speak into your microphone — your speech will be transcribed and pasted into the input field
- Claude will respond in text AND read the response aloud
You can create a desktop shortcut that launches the companion with a single double-click — no console window, with start/stop/restart buttons.
- Right-click on your Desktop → New → Text Document
- Name it
Spoken MCP.vbs(make sure the extension is.vbs, not.vbs.txt— you may need to enable "Show file extensions" in Windows Explorer) - Right-click the file → Edit (or open with Notepad)
- Paste the following content, replacing the paths with your actual Python and script locations:
Set WshShell = CreateObject("WScript.Shell")
Set fso = CreateObject("Scripting.FileSystemObject")
pythonExe = "C:\Python312\python.exe"
scriptPath = "C:\Users\YourName\Desktop\spoken-mcp\stt_companion.py"
' Check if companion is already running
Set objWMI = GetObject("winmgmts:\\.\root\cimv2")
Set processes = objWMI.ExecQuery("SELECT * FROM Win32_Process WHERE CommandLine LIKE '%stt_companion.py%' AND Name = 'python.exe'")
alreadyRunning = (processes.Count > 0)
If alreadyRunning Then
msg = "Spoken MCP is running!" & vbCrLf & vbCrLf & _
"STOP = Abort button" & vbCrLf & _
"RESTART = Retry button" & vbCrLf & _
"KEEP RUNNING = Ignore button"
result = MsgBox(msg, vbAbortRetryIgnore + vbQuestion, "Spoken MCP")
If result = vbAbort Then
' STOP
For Each proc In processes
proc.Terminate()
Next
MsgBox "Spoken MCP stopped.", vbInformation, "Spoken MCP"
ElseIf result = vbRetry Then
' RESTART
For Each proc In processes
proc.Terminate()
Next
WScript.Sleep 1000
WshShell.Run """" & pythonExe & """ """ & scriptPath & """", 0, False
MsgBox "Spoken MCP restarted!", vbInformation, "Spoken MCP"
End If
Else
WshShell.Run """" & pythonExe & """ """ & scriptPath & """", 0, False
MsgBox "Spoken MCP started!" & vbCrLf & vbCrLf & _
"Look for the icon in the system tray (near the clock)." & vbCrLf & _
"Double-click this shortcut again to stop or restart.", _
vbInformation, "Spoken MCP"
End If- Save and close. Now double-click
Spoken MCP.vbsto launch!
How the shortcut works:
- First launch: Starts the companion in the background (no console window) and shows a confirmation
- If already running: Shows a dialog with three buttons:
- Abort = Stop the companion
- Retry = Restart the companion
- Ignore = Keep running (do nothing)
| Key | Description | Default |
|---|---|---|
elevenlabs_api_key |
Your ElevenLabs API key | (required) |
tts.voice_id |
ElevenLabs voice ID for speech output | (required) |
tts.model_id |
TTS model | eleven_v3 |
tts.language_code |
Language for TTS output | hu |
stt.model_id |
STT model | scribe_v1 |
stt.language_code |
Language hint for transcription | hu |
audio.sample_rate |
Microphone sample rate in Hz | 16000 |
audio.channels |
Audio channels (1 = mono) | 1 |
hotkey |
Key for push-to-talk or VAD toggle | caps lock |
mute_key |
Key to toggle mic mute in vad_always mode |
f4 |
mode |
Input mode: push_to_talk, vad, or vad_always |
vad_always |
vad.aggressiveness |
WebRTC VAD sensitivity: 0 (least) to 3 (most aggressive) | 2 |
vad.silence_timeout |
Seconds of silence before ending a recording | 3.0 |
vad.speech_threshold |
Minimum consecutive speech frames before recording starts | 3 |
vad.volume_threshold |
Minimum volume level to count as speech (0 = disabled) | 500 |
vad.min_duration |
Minimum recording duration in seconds (filters out coughs, breaths) | 0.5 |
auto_paste |
Automatically paste transcription into the active window | true |
auto_enter |
Automatically press Enter after pasting (hands-free mode) | false |
spoken-mcp/
├── README.md — This file (English)
├── README_HU.md — Hungarian documentation
├── LICENSE — MIT License
├── config.json — Your config with API key (gitignored)
├── config.example.json — Template config without secrets
├── requirements.txt — Python dependencies
├── tts_server.py — MCP TTS server (Claude calls this)
├── stt_companion.py — STT companion app with system tray
├── icons/
│ ├── mic_idle.png — Tray icon: idle (grey)
│ ├── mic_listening.png — Tray icon: listening (green)
│ └── mic_active.png — Tray icon: recording (red)
└── .gitignore
| Component | Technology |
|---|---|
| MCP server | mcp SDK, stdio transport |
| Text-to-Speech | ElevenLabs API (eleven_v3), MP3 44.1 kHz |
| Speech-to-Text | ElevenLabs API (Scribe v1) |
| Audio playback | pygame.mixer (single init, no crackling) |
| Microphone input | sounddevice |
| Voice Activity Detection | webrtcvad |
| Keyboard hotkeys | pynput (no admin rights needed) |
| System tray | pystray + Pillow |
| Clipboard / paste | pyperclip + Win32 keybd_event |
| Problem | Solution |
|---|---|
speak tool not available in Claude |
Make sure tts_server.py path is correct in claude_desktop_config.json and restart Claude Desktop |
| No audio playback | Check your speakers/headphones. Try running python -c "import pygame; pygame.mixer.init(); print('OK')" |
| STT not transcribing | Check your microphone is working and not muted in Windows Sound settings |
| VAD picks up background noise | Increase vad.volume_threshold or use the mute key to temporarily disable the mic |
| Transcription contains noise descriptions like "(music)" | This is filtered automatically. If it still happens, increase vad.aggressiveness to 3 |
| Multiple instances running | The companion has built-in single-instance protection. Kill all python.exe processes and restart |
webrtcvad install fails |
Use pip install webrtcvad-wheels instead |
keyboard library needs admin |
This project uses pynput instead — no admin rights needed |
| Claude's voice is transcribed as your input (feedback loop) | This is handled automatically via a lock file. If it still happens, use headphones or press the mute key while Claude is speaking |
MIT — see LICENSE for details.