Add TTS endpoint (/v1/audio/speech) to local server by j0nl1 · Pull Request #471 · argmaxinc/argmax-oss-swift

j0nl1 · 2026-05-08T16:02:31Z

Summary

Adds a text-to-speech endpoint to the local server (argmax-cli serve), enabling the same server to handle both STT and TTS via OpenAI-compatible APIs.

POST /v1/audio/transcriptions — STT (existing)
POST /v1/audio/speech — TTS (new)

Changes

Import TTSKit in ServeCLI.swift
Load Qwen3-TTS 0.6B model alongside WhisperKit on server startup
Add POST /v1/audio/speech as a manual Vapor route (same pattern as /health)
OpenAI voice name mapping (alloy → ryan, nova → serena, echo → aiden, shimmer → vivian, onyx → eric, fable → dylan) with passthrough for native Qwen3 voice names
10 languages supported via short code or full name (es/spanish, en/english, fr/french, etc.)
WAV response encoder (24kHz mono 16-bit PCM)
Updated endpoint listing on root / route

Motivation

The local server currently only handles transcription. Adding TTS makes it a complete voice server — useful for smart home assistants, accessibility tools, or any application needing
both STT and TTS from a single local endpoint on Apple Silicon.

Real-world usage

I built a Home Assistant custom integration that consumes this endpoint: ha-argmax-tts. It registers as a TTS provider in Home Assistant,
allowing voice assistants to use argmax for local speech synthesis — no cloud APIs needed. The integration supports all 10 languages, configurable voice/model selection via UI, and
connection validation via /health.

Request format

POST /v1/audio/speech
{
  "input": "Hello world",
  "voice": "nova",
  "language": "en",
  "model": "qwen3-tts-0.6b"
}

Returns audio/wav (24kHz mono 16-bit PCM).

Testing

Tested on Mac Studio (Apple Silicon) with Qwen3-TTS 0.6B model. TTS generation completes in ~300-500ms for short sentences.

- Import TTSKit and load Qwen3-TTS 0.6B model on server startup - New POST /v1/audio/speech endpoint compatible with OpenAI TTS API - Voice mapping: OpenAI names (alloy, echo, nova...) to Qwen3 voices - 10 language support: es, en, fr, de, pt, it, ja, ko, zh, ru - WAV encoder (24kHz mono 16-bit PCM)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TTS endpoint (/v1/audio/speech) to local server#471

Add TTS endpoint (/v1/audio/speech) to local server#471
j0nl1 wants to merge 1 commit into
argmaxinc:mainfrom
j0nl1:main

j0nl1 commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

j0nl1 commented May 8, 2026

Summary

Changes

Motivation

Real-world usage

Request format

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant