voice-audition

The casting director for your AI voice agent. 697 voices across 9 TTS providers (645 enriched with LLM-generated descriptions and traits), semantic search via Moss, use-case auditions with AI scoring, cost comparison, web UI, and MCP server for Claude.

Install

pip install voice-audition
pip install voice-audition[enrich]   # LLM-based voice enrichment (Gemini, OpenAI, local MLX)
pip install voice-audition[mcp]      # MCP server for Claude Desktop
pip install voice-audition[acoustic] # acoustic feature analysis (librosa, Parselmouth)
pip install voice-audition[clap]     # CLAP embeddings for audio similarity search

Setup

voice-audition setup

This creates a .env in your working directory and prints the MCP config for Claude Desktop. All API keys are optional — you only need keys for the providers you want to sync. Moss credentials enable semantic search; without them, keyword search is used as fallback.

If developing from source:

git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp,enrich]"
cp .env.example .env   # edit to add your API keys

Quick start

# Sync voices from providers
voice-audition sync

# Search
voice-audition search "warm female voice for healthcare"

# Analyze top options (no audio generated)
voice-audition analyze "voice for fertility clinic"

# Run a full audition with AI scoring
voice-audition audition "fertility clinic for anxious IVF patients" --gender female

# Compare costs at scale
voice-audition costs 100000

Web UI

A lightweight web interface for browsing the voice catalog, filtering by attributes, and generating TTS audio samples.

# Terminal 1: Start the Hono API server
cd server && npm install && npm run dev

# Terminal 2: Start the Vite frontend
cd frontend && npm install && npm run dev

Open http://localhost:5173. The UI provides:

Voice list with search and filters (provider, gender, age, texture, pitch, use case, enrichment status)
Voice detail pages with trait scores, tags, and descriptions
TTS generation — type text and hear any voice (requires provider API keys in .env)

Commands

Command	What it does
`voice-audition setup`	Generate .env, print MCP config, verify installation
`voice-audition sync [providers...]`	Sync voices from TTS provider APIs
`voice-audition enrich [providers...] [--status]`	Enrich voices with LLM-generated descriptions and traits
`voice-audition pipeline [--providers ...]`	Run full pipeline: sync → enrich → rebuild index
`voice-audition index [--force]`	Build or rebuild the Moss semantic search index
`voice-audition search <query> [--top-k N]`	Semantic search (falls back to keyword search without Moss)
`voice-audition analyze <brief>`	Recommend best/budget/safest voices without generating audio
`voice-audition audition <brief> [--mode ai\|human]`	Run a use-case audition with audio generation and scoring
`voice-audition costs <minutes>`	Compare API vs self-hosted costs at a monthly volume
`voice-audition enrich-acoustic [providers...]`	Extract acoustic features (pitch, speech rate, HNR)
`voice-audition embed [providers...]`	Generate CLAP embeddings for audio similarity
`voice-audition search-audio <path>`	Find voices similar to an audio clip
`voice-audition stats`	Catalog statistics
`voice-audition runs [--last N]`	Recent pipeline run history
`voice-audition monitor`	Check provider reliability via status pages
`voice-audition mcp`	Start the MCP server

MCP server

Run voice-audition setup to get the config snippet, or add this to your Claude Desktop config manually:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "voice-audition": {
      "command": "voice-audition",
      "args": ["mcp"]
    }
  }
}

Set MOSS_PROJECT_ID and MOSS_PROJECT_KEY in your .env (in the directory where you run Claude Desktop) for semantic search. Without them, keyword search is used as fallback.

Tool	What it does
`search_voices`	Semantic search with optional acoustic filters
`analyze_voices`	Recommend best voices without generating audio
`filter_voices`	Filter by gender, provider, accent, age group, use case
`get_voice`	Full details for a specific voice
`get_catalog_stats`	Catalog overview with enrichment coverage
`run_voice_audition`	Full audition with audio generation and AI scoring
`calculate_voice_costs`	API vs self-hosted cost comparison
`find_similar_voices`	Find acoustically similar voices via embeddings
`get_acoustic_profile`	Measured acoustic features for a voice

Voice catalog

697 voices across 9 providers in a checked-in SQLite database (645 enriched):

Type	Providers
Commercial	ElevenLabs, OpenAI, Deepgram, Rime
Syncable	Cartesia, PlayHT, Azure, Google (API keys required)
Open source	Kokoro, Piper, Orpheus, Chatterbox, Fish Speech

Diff-based sync: detects added, removed, and changed voices
Enrichment data preserved across re-syncs
Deprecated voices filtered from search/audition
Weekly pricing change detection via page hash diff
Rich failure metadata with error classification (transient/auth/unsupported/validation)

Enrichment

The enrichment pipeline generates audio samples, sends them to an LLM for classification, and fills in descriptions, traits, and tags. Configure your LLM provider in enrichment/enrichment.yaml:

enrichment:
  provider: gemini  # gemini | openai | anthropic | ollama | mlx

Supported: Gemini, OpenAI, Anthropic, Bedrock, Ollama (Qwen2-Audio), local MLX.

TTS generators for enrichment audio: Rime, ElevenLabs, Deepgram, OpenAI. Open-source models (Kokoro, Piper) supported with local setup.

voice-audition enrich rime --limit 10    # enrich 10 Rime voices
voice-audition enrich --status           # show enrichment progress
voice-audition enrich --retry            # retry failed voices (up to 3 attempts)
voice-audition pipeline                  # sync + enrich + rebuild index

Audition profiles

6 built-in use-case profiles with domain-specific scoring criteria:

Profile	Criteria
Healthcare	patient comfort, trust, empathy, clarity, pacing, sensitivity
Sales	energy, rapport, persuasiveness, confidence, resilience, likability
Support	patience, clarity, helpfulness, professionalism, warmth, resolution focus
Finance	authority, precision, trustworthiness, calm, professionalism, compliance
Meditation	calm, spaciousness, grounding, non-intrusive, breath quality, presence
Education	clarity, patience, encouragement, structure, warmth, confidence

Development

git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp,enrich]"
cp .env.example .env   # edit to add your API keys
python -m pytest tests/

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

voice-audition

Install

Setup

Quick start

Web UI

Commands

MCP server

Voice catalog

Enrichment

Audition profiles

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.claude/skills/voice-audition		.claude/skills/voice-audition
.github/workflows		.github/workflows
.pricing-snapshot		.pricing-snapshot
audition		audition
catalog		catalog
cli		cli
enrichment		enrichment
frontend		frontend
mcp		mcp
monitor		monitor
server		server
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

voice-audition

Install

Setup

Quick start

Web UI

Commands

MCP server

Voice catalog

Enrichment

Audition profiles

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages