Feat/supertonic tts engine#1067
Open
tantara wants to merge 3 commits into
Open
Conversation
Refactor the Kokoro-only TTS command into a pluggable engine system and add Supertonic 3 as a second backend. Unlike Kokoro (which shells out to Python via kokoro-onnx), Supertonic runs the full 4-model pipeline in-process through onnxruntime-node -- no Python, no new dependencies. - engine.ts: TtsEngine interface + lazy getEngine() registry (kokoro|supertonic) - engines/kokoro.ts: adapter over the existing kokoro-onnx path (unchanged behavior) - engines/supertonic/manager.ts: auto-downloads models + voice styles from huggingface.co/Supertone/supertonic-3 to ~/.cache/hyperframes/tts/supertonic/ - engines/supertonic/runtime.ts: inference pipeline ported from upstream helper.js - engines/supertonic/index.ts: SupertonicEngine implementation - commands/tts.ts: add --engine and --steps flags; route voices/langs per engine 44.1 kHz Supertonic output is compatible with the producer audio mixer, which resamples all inputs to 48 kHz. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add engine-selection guidance to the media skill now that two TTS engines exist. Rule of thumb: English or Chinese -> Kokoro; everything else -> Supertonic. - Kokoro is the only engine with Chinese (zh); Supertonic has no Chinese. - Supertonic covers 31 languages and needs no Python/espeak-ng, so it is the preferred path for Korean, German, Russian, Arabic, and the other non-English languages Kokoro cannot synthesize. - Document --engine/--steps usage, Supertonic voices (F1-F5, M1-M5), the full 31-code --lang list, and per-engine requirements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow 303/307/308 in addition to 301/302, resolve relative Location headers against the request URL, and cap redirects at 10 to avoid infinite loops. Improves reliability of model downloads that bounce through CDN redirects (e.g. Hugging Face for the Supertonic TTS assets). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Supertonic 3 as a second --engine, engine abstraction refactor, skill routing guidance, download hardening — with usage examples.
Why
Kokoro's 8-language / Python+espeak-ng limits vs Supertonic's 31 languages in-process, and the deliberate English-or-Chinese → Kokoro, everything else → Supertonic routing (Chinese is Kokoro-only).
How
per-file breakdown plus the three notable decisions: zero new deps, the 24 kHz↔44.1 kHz sample-rate handling (producer resamples to 48 kHz), and abstraction-over-replacement.
Examples
[kokoro:english]
scope-persona-intro-english-voiceover-under10mb.mp4
[supertonic:korean]
scope-persona-intro-korean-supertonic-under10mb.mp4
Test plan
How was this tested?