spoken-mcp

A fully voice-enabled interface for the Claude Desktop app. Speak to Claude and hear responses spoken aloud — no admin rights required.

Magyar verzió / Hungarian version: README_HU.md

What does it do?

spoken-mcp turns Claude Desktop into a voice assistant. You speak into your microphone, your speech is transcribed and sent to Claude, and Claude's response is read aloud to you — all automatically.

The system consists of two components that work alongside the Claude Desktop app:

MCP TTS Server (tts_server.py) — An MCP server that Claude calls via its speak tool. It sends Claude's response text to ElevenLabs Text-to-Speech and plays the audio on your speakers.
STT Companion (stt_companion.py) — A background app with a system tray icon. It listens to your microphone, transcribes your speech via ElevenLabs Speech-to-Text (Scribe v1), and automatically pastes the transcription into Claude Desktop's input field.

A note on latency

This project prioritizes response quality and voice quality over speed. Claude's full text response is generated first, then converted to high-quality audio (MP3, 44.1 kHz). This means there is a noticeable delay between your question and hearing the spoken response — typically a few seconds depending on response length. This is a deliberate design choice: we chose natural-sounding, high-fidelity voice output over lower-latency but lower-quality alternatives.

Architecture

┌─────────────────────────────────────────────────────┐
│                 Claude Desktop App                   │
│                                                      │
│   [User types or pastes from STT] → Claude responds  │
│                                          ↓           │
│                                    tool call: speak  │
│                                          ↓           │
│                              ┌───────────────────┐   │
│                              │  MCP TTS Server   │   │
│                              │  (ElevenLabs)     │   │
│                              └───────────────────┘   │
└─────────────────────────────────────────────────────┘
                ↑ auto-paste
┌─────────────────────────────────────────────────────┐
│          STT Companion (background process)          │
│                                                      │
│   Hotkey / VAD → record mic                          │
│   Speech detected → ElevenLabs STT → paste           │
│                                                      │
│   System tray: grey = idle, green = listening,       │
│                red = recording                       │
└─────────────────────────────────────────────────────┘

Input Modes

The STT Companion supports three modes (set "mode" in config.json):

Mode	Config value	How it works
Push-to-talk	`"push_to_talk"`	Hold the hotkey to record, release to transcribe
VAD toggle	`"vad"`	Press the hotkey once to start listening. VAD detects speech and silence, sends automatically, then stops listening
VAD always-on	`"vad_always"`	Continuously listens for speech. Use a configurable mute key (e.g., F4) to toggle the microphone on/off

System tray icon

Color	Meaning
Grey	Idle — not listening
Green	Listening — waiting for speech (VAD active)
Red	Recording — speech detected, capturing audio

Right-click the tray icon for a menu showing the current mode, key bindings, and an exit option.

Prerequisites

Before you start, make sure you have:

Windows 10 or 11
Python 3.11 or newer — Download from python.org. During installation, check "Add Python to PATH".
Claude Desktop app — Download from claude.ai/download
An ElevenLabs account with API access — Sign up at elevenlabs.io. A paid plan is recommended — the free tier has limited characters per month, and the STT (Scribe) API may require a paid subscription. You'll need:
- Your API key (found in Profile → API Keys)
- A Voice ID for TTS (found in Voices → click a voice → Voice ID)

Setup Guide

Step 1: Download the project

Option A — Using Git:

git clone https://github.com/leszini/spoken-mcp.git
cd spoken-mcp

Option B — Manual download: Download the ZIP from GitHub, extract it to a folder (e.g., C:\Users\YourName\Desktop\spoken-mcp).

Step 2: Install Python dependencies

Open a terminal (Command Prompt or PowerShell) in the project folder and run:

pip install -r requirements.txt

Note: If webrtcvad fails to install, try: pip install webrtcvad-wheels — this provides pre-built Windows binaries.

Step 3: Create your configuration file

In the project folder, find config.example.json
Copy it and rename the copy to config.json
Open config.json in any text editor (Notepad works fine)
Fill in your settings:

{
  "elevenlabs_api_key": "paste-your-api-key-here",
  "tts": {
    "voice_id": "paste-your-voice-id-here",
    "model_id": "eleven_v3",
    "language_code": "en"
  },
  "stt": {
    "model_id": "scribe_v1",
    "language_code": "en"
  },
  "audio": {
    "sample_rate": 16000,
    "channels": 1
  },
  "hotkey": "caps lock",
  "mute_key": "f4",
  "mode": "vad_always",
  "vad": {
    "aggressiveness": 2,
    "silence_timeout": 3.0,
    "speech_threshold": 3,
    "volume_threshold": 500,
    "min_duration": 0.5
  },
  "auto_paste": true,
  "auto_enter": false
}

Where to find your ElevenLabs credentials:

API key: Log in to elevenlabs.io → click your profile icon (top-right) → Profile + API key → copy the key
Voice ID: Go to Voices → pick a voice you like → click on it → copy the Voice ID from the URL or the details panel

Key settings to customize:

language_code — Change "en" to your language (e.g., "hu" for Hungarian, "de" for German, "es" for Spanish). Set it in both tts and stt sections.
mode — Choose your preferred input mode (see Input Modes above)
mute_key — The key to mute/unmute the microphone in vad_always mode. Supported keys: f1–f12, caps lock, scroll lock, pause, insert, num lock
auto_enter — Set to true if you want transcriptions to be sent automatically (hands-free). Set to false to review before sending.
vad.volume_threshold — Minimum audio volume to be considered speech. Increase if background noise triggers false transcriptions. Set to 0 to disable (use the mute key instead).

Important: config.json contains your API key and is listed in .gitignore — it will never be uploaded to GitHub.

Step 4: Register the MCP server in Claude Desktop

This tells Claude Desktop about the speak tool so Claude can use it.

Open Claude Desktop
Go to Settings (gear icon) → Developer → Edit Config
This opens the file claude_desktop_config.json. Add the following inside it:

{
  "mcpServers": {
    "spoken-mcp": {
      "command": "python",
      "args": ["C:\\full\\path\\to\\spoken-mcp\\tts_server.py"]
    }
  }
}

Important: Replace C:\\full\\path\\to\\spoken-mcp\\tts_server.py with the actual path to the file on your computer. Use double backslashes (\\) in the path.

For example, if you put the project on your Desktop:

"args": ["C:\\Users\\YourName\\Desktop\\spoken-mcp\\tts_server.py"]

Tip: If the python command doesn't work, use the full Python path instead, e.g., "command": "C:\\Python312\\python.exe". You can find yours by running where python in a terminal.

Save the file and restart Claude Desktop completely (close and reopen it).

Step 5: Start the STT Companion

In a terminal, run:

python stt_companion.py

A system tray icon will appear near your clock (you may need to click the ^ arrow to see it). The companion is now running in the background.

Step 6: Test it!

Open Claude Desktop and start a conversation
Speak into your microphone — your speech will be transcribed and pasted into the input field
Claude will respond in text AND read the response aloud

Optional: Create a desktop shortcut

You can create a desktop shortcut that launches the companion with a single double-click — no console window, with start/stop/restart buttons.

Right-click on your Desktop → New → Text Document
Name it Spoken MCP.vbs (make sure the extension is .vbs, not .vbs.txt — you may need to enable "Show file extensions" in Windows Explorer)
Right-click the file → Edit (or open with Notepad)
Paste the following content, replacing the paths with your actual Python and script locations:

Set WshShell = CreateObject("WScript.Shell")
Set fso = CreateObject("Scripting.FileSystemObject")

pythonExe = "C:\Python312\python.exe"
scriptPath = "C:\Users\YourName\Desktop\spoken-mcp\stt_companion.py"

' Check if companion is already running
Set objWMI = GetObject("winmgmts:\\.\root\cimv2")
Set processes = objWMI.ExecQuery("SELECT * FROM Win32_Process WHERE CommandLine LIKE '%stt_companion.py%' AND Name = 'python.exe'")

alreadyRunning = (processes.Count > 0)

If alreadyRunning Then
    msg = "Spoken MCP is running!" & vbCrLf & vbCrLf & _
          "STOP = Abort button" & vbCrLf & _
          "RESTART = Retry button" & vbCrLf & _
          "KEEP RUNNING = Ignore button"
    result = MsgBox(msg, vbAbortRetryIgnore + vbQuestion, "Spoken MCP")

    If result = vbAbort Then
        ' STOP
        For Each proc In processes
            proc.Terminate()
        Next
        MsgBox "Spoken MCP stopped.", vbInformation, "Spoken MCP"
    ElseIf result = vbRetry Then
        ' RESTART
        For Each proc In processes
            proc.Terminate()
        Next
        WScript.Sleep 1000
        WshShell.Run """" & pythonExe & """ """ & scriptPath & """", 0, False
        MsgBox "Spoken MCP restarted!", vbInformation, "Spoken MCP"
    End If
Else
    WshShell.Run """" & pythonExe & """ """ & scriptPath & """", 0, False
    MsgBox "Spoken MCP started!" & vbCrLf & vbCrLf & _
           "Look for the icon in the system tray (near the clock)." & vbCrLf & _
           "Double-click this shortcut again to stop or restart.", _
           vbInformation, "Spoken MCP"
End If

Save and close. Now double-click Spoken MCP.vbs to launch!

How the shortcut works:

First launch: Starts the companion in the background (no console window) and shows a confirmation
If already running: Shows a dialog with three buttons:
- Abort = Stop the companion
- Retry = Restart the companion
- Ignore = Keep running (do nothing)

Configuration Reference

Key	Description	Default
`elevenlabs_api_key`	Your ElevenLabs API key	(required)
`tts.voice_id`	ElevenLabs voice ID for speech output	(required)
`tts.model_id`	TTS model	`eleven_v3`
`tts.language_code`	Language for TTS output	`hu`
`stt.model_id`	STT model	`scribe_v1`
`stt.language_code`	Language hint for transcription	`hu`
`audio.sample_rate`	Microphone sample rate in Hz	`16000`
`audio.channels`	Audio channels (1 = mono)	`1`
`hotkey`	Key for push-to-talk or VAD toggle	`caps lock`
`mute_key`	Key to toggle mic mute in `vad_always` mode	`f4`
`mode`	Input mode: `push_to_talk`, `vad`, or `vad_always`	`vad_always`
`vad.aggressiveness`	WebRTC VAD sensitivity: 0 (least) to 3 (most aggressive)	`2`
`vad.silence_timeout`	Seconds of silence before ending a recording	`3.0`
`vad.speech_threshold`	Minimum consecutive speech frames before recording starts	`3`
`vad.volume_threshold`	Minimum volume level to count as speech (0 = disabled)	`500`
`vad.min_duration`	Minimum recording duration in seconds (filters out coughs, breaths)	`0.5`
`auto_paste`	Automatically paste transcription into the active window	`true`
`auto_enter`	Automatically press Enter after pasting (hands-free mode)	`false`

File Structure

spoken-mcp/
├── README.md              — This file (English)
├── README_HU.md           — Hungarian documentation
├── LICENSE                 — MIT License
├── config.json            — Your config with API key (gitignored)
├── config.example.json    — Template config without secrets
├── requirements.txt       — Python dependencies
├── tts_server.py          — MCP TTS server (Claude calls this)
├── stt_companion.py       — STT companion app with system tray
├── icons/
│   ├── mic_idle.png       — Tray icon: idle (grey)
│   ├── mic_listening.png  — Tray icon: listening (green)
│   └── mic_active.png     — Tray icon: recording (red)
└── .gitignore

Tech Stack

Component	Technology
MCP server	`mcp` SDK, stdio transport
Text-to-Speech	ElevenLabs API (eleven_v3), MP3 44.1 kHz
Speech-to-Text	ElevenLabs API (Scribe v1)
Audio playback	`pygame.mixer` (single init, no crackling)
Microphone input	`sounddevice`
Voice Activity Detection	`webrtcvad`
Keyboard hotkeys	`pynput` (no admin rights needed)
System tray	`pystray` + `Pillow`
Clipboard / paste	`pyperclip` + Win32 `keybd_event`

Troubleshooting

Problem	Solution
`speak` tool not available in Claude	Make sure `tts_server.py` path is correct in `claude_desktop_config.json` and restart Claude Desktop
No audio playback	Check your speakers/headphones. Try running `python -c "import pygame; pygame.mixer.init(); print('OK')"`
STT not transcribing	Check your microphone is working and not muted in Windows Sound settings
VAD picks up background noise	Increase `vad.volume_threshold` or use the mute key to temporarily disable the mic
Transcription contains noise descriptions like "(music)"	This is filtered automatically. If it still happens, increase `vad.aggressiveness` to 3
Multiple instances running	The companion has built-in single-instance protection. Kill all `python.exe` processes and restart
`webrtcvad` install fails	Use `pip install webrtcvad-wheels` instead
`keyboard` library needs admin	This project uses `pynput` instead — no admin rights needed
Claude's voice is transcribed as your input (feedback loop)	This is handled automatically via a lock file. If it still happens, use headphones or press the mute key while Claude is speaking

License

MIT — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spoken-mcp

What does it do?

A note on latency

Architecture

Input Modes

System tray icon

Prerequisites

Setup Guide

Step 1: Download the project

Step 2: Install Python dependencies

Step 3: Create your configuration file

Step 4: Register the MCP server in Claude Desktop

Step 5: Start the STT Companion

Step 6: Test it!

Optional: Create a desktop shortcut

Configuration Reference

File Structure

Tech Stack

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
icons		icons
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_HU.md		README_HU.md
config.example.json		config.example.json
generate_icons.py		generate_icons.py
requirements.txt		requirements.txt
spoken_icon.png		spoken_icon.png
stt_companion.py		stt_companion.py
tts_server.py		tts_server.py

Folders and files

Latest commit

History

Repository files navigation

spoken-mcp

What does it do?

A note on latency

Architecture

Input Modes

System tray icon

Prerequisites

Setup Guide

Step 1: Download the project

Step 2: Install Python dependencies

Step 3: Create your configuration file

Step 4: Register the MCP server in Claude Desktop

Step 5: Start the STT Companion

Step 6: Test it!

Optional: Create a desktop shortcut

Configuration Reference

File Structure

Tech Stack

Troubleshooting

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages