Skip to content

mnvsk97/voice-audition

Repository files navigation

voice-audition

PyPI License: MIT Python 3.11+

The casting director for your AI voice agent. 697 voices across 9 TTS providers (645 enriched with LLM-generated descriptions and traits), semantic search via Moss, use-case auditions with AI scoring, cost comparison, web UI, and MCP server for Claude.

Install

pip install voice-audition
pip install voice-audition[enrich]   # LLM-based voice enrichment (Gemini, OpenAI, local MLX)
pip install voice-audition[mcp]      # MCP server for Claude Desktop
pip install voice-audition[acoustic] # acoustic feature analysis (librosa, Parselmouth)
pip install voice-audition[clap]     # CLAP embeddings for audio similarity search

Setup

voice-audition setup

This creates a .env in your working directory and prints the MCP config for Claude Desktop. All API keys are optional — you only need keys for the providers you want to sync. Moss credentials enable semantic search; without them, keyword search is used as fallback.

If developing from source:

git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp,enrich]"
cp .env.example .env   # edit to add your API keys

Quick start

# Sync voices from providers
voice-audition sync

# Search
voice-audition search "warm female voice for healthcare"

# Analyze top options (no audio generated)
voice-audition analyze "voice for fertility clinic"

# Run a full audition with AI scoring
voice-audition audition "fertility clinic for anxious IVF patients" --gender female

# Compare costs at scale
voice-audition costs 100000

Web UI

A lightweight web interface for browsing the voice catalog, filtering by attributes, and generating TTS audio samples.

# Terminal 1: Start the Hono API server
cd server && npm install && npm run dev

# Terminal 2: Start the Vite frontend
cd frontend && npm install && npm run dev

Open http://localhost:5173. The UI provides:

  • Voice list with search and filters (provider, gender, age, texture, pitch, use case, enrichment status)
  • Voice detail pages with trait scores, tags, and descriptions
  • TTS generation — type text and hear any voice (requires provider API keys in .env)

Commands

Command What it does
voice-audition setup Generate .env, print MCP config, verify installation
voice-audition sync [providers...] Sync voices from TTS provider APIs
voice-audition enrich [providers...] [--status] Enrich voices with LLM-generated descriptions and traits
voice-audition pipeline [--providers ...] Run full pipeline: sync → enrich → rebuild index
voice-audition index [--force] Build or rebuild the Moss semantic search index
voice-audition search <query> [--top-k N] Semantic search (falls back to keyword search without Moss)
voice-audition analyze <brief> Recommend best/budget/safest voices without generating audio
voice-audition audition <brief> [--mode ai|human] Run a use-case audition with audio generation and scoring
voice-audition costs <minutes> Compare API vs self-hosted costs at a monthly volume
voice-audition enrich-acoustic [providers...] Extract acoustic features (pitch, speech rate, HNR)
voice-audition embed [providers...] Generate CLAP embeddings for audio similarity
voice-audition search-audio <path> Find voices similar to an audio clip
voice-audition stats Catalog statistics
voice-audition runs [--last N] Recent pipeline run history
voice-audition monitor Check provider reliability via status pages
voice-audition mcp Start the MCP server

MCP server

Run voice-audition setup to get the config snippet, or add this to your Claude Desktop config manually:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "voice-audition": {
      "command": "voice-audition",
      "args": ["mcp"]
    }
  }
}

Set MOSS_PROJECT_ID and MOSS_PROJECT_KEY in your .env (in the directory where you run Claude Desktop) for semantic search. Without them, keyword search is used as fallback.

Tool What it does
search_voices Semantic search with optional acoustic filters
analyze_voices Recommend best voices without generating audio
filter_voices Filter by gender, provider, accent, age group, use case
get_voice Full details for a specific voice
get_catalog_stats Catalog overview with enrichment coverage
run_voice_audition Full audition with audio generation and AI scoring
calculate_voice_costs API vs self-hosted cost comparison
find_similar_voices Find acoustically similar voices via embeddings
get_acoustic_profile Measured acoustic features for a voice

Voice catalog

697 voices across 9 providers in a checked-in SQLite database (645 enriched):

Type Providers
Commercial ElevenLabs, OpenAI, Deepgram, Rime
Syncable Cartesia, PlayHT, Azure, Google (API keys required)
Open source Kokoro, Piper, Orpheus, Chatterbox, Fish Speech
  • Diff-based sync: detects added, removed, and changed voices
  • Enrichment data preserved across re-syncs
  • Deprecated voices filtered from search/audition
  • Weekly pricing change detection via page hash diff
  • Rich failure metadata with error classification (transient/auth/unsupported/validation)

Enrichment

The enrichment pipeline generates audio samples, sends them to an LLM for classification, and fills in descriptions, traits, and tags. Configure your LLM provider in enrichment/enrichment.yaml:

enrichment:
  provider: gemini  # gemini | openai | anthropic | ollama | mlx

Supported: Gemini, OpenAI, Anthropic, Bedrock, Ollama (Qwen2-Audio), local MLX.

TTS generators for enrichment audio: Rime, ElevenLabs, Deepgram, OpenAI. Open-source models (Kokoro, Piper) supported with local setup.

voice-audition enrich rime --limit 10    # enrich 10 Rime voices
voice-audition enrich --status           # show enrichment progress
voice-audition enrich --retry            # retry failed voices (up to 3 attempts)
voice-audition pipeline                  # sync + enrich + rebuild index

Audition profiles

6 built-in use-case profiles with domain-specific scoring criteria:

Profile Criteria
Healthcare patient comfort, trust, empathy, clarity, pacing, sensitivity
Sales energy, rapport, persuasiveness, confidence, resilience, likability
Support patience, clarity, helpfulness, professionalism, warmth, resolution focus
Finance authority, precision, trustworthiness, calm, professionalism, compliance
Meditation calm, spaciousness, grounding, non-intrusive, breath quality, presence
Education clarity, patience, encouragement, structure, warmth, confidence

Development

git clone https://github.com/mnvsk97/voice-audition.git
cd voice-audition
pip install -e ".[mcp,enrich]"
cp .env.example .env   # edit to add your API keys
python -m pytest tests/

License

MIT

About

VoiceAudition — the voice casting director for AI agents. Cross-provider TTS voice discovery and recommendation.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors