Skip to content

Feat/supertonic tts engine#1067

Open
tantara wants to merge 3 commits into
heygen-com:mainfrom
tantara:feat/supertonic-tts-engine
Open

Feat/supertonic tts engine#1067
tantara wants to merge 3 commits into
heygen-com:mainfrom
tantara:feat/supertonic-tts-engine

Conversation

@tantara
Copy link
Copy Markdown

@tantara tantara commented May 24, 2026

What

Supertonic 3 as a second --engine, engine abstraction refactor, skill routing guidance, download hardening — with usage examples.

Why

Kokoro's 8-language / Python+espeak-ng limits vs Supertonic's 31 languages in-process, and the deliberate English-or-Chinese → Kokoro, everything else → Supertonic routing (Chinese is Kokoro-only).

How

per-file breakdown plus the three notable decisions: zero new deps, the 24 kHz↔44.1 kHz sample-rate handling (producer resamples to 48 kHz), and abstraction-over-replacement.

Examples

[kokoro:english]

scope-persona-intro-english-voiceover-under10mb.mp4

[supertonic:korean]

scope-persona-intro-korean-supertonic-under10mb.mp4

Test plan

How was this tested?

  • Unit tests added/updated
  • Manual testing performed
  • Documentation updated (if applicable)

tantara and others added 3 commits May 24, 2026 13:57
Refactor the Kokoro-only TTS command into a pluggable engine system and add
Supertonic 3 as a second backend. Unlike Kokoro (which shells out to Python via
kokoro-onnx), Supertonic runs the full 4-model pipeline in-process through
onnxruntime-node -- no Python, no new dependencies.

- engine.ts: TtsEngine interface + lazy getEngine() registry (kokoro|supertonic)
- engines/kokoro.ts: adapter over the existing kokoro-onnx path (unchanged behavior)
- engines/supertonic/manager.ts: auto-downloads models + voice styles from
  huggingface.co/Supertone/supertonic-3 to ~/.cache/hyperframes/tts/supertonic/
- engines/supertonic/runtime.ts: inference pipeline ported from upstream helper.js
- engines/supertonic/index.ts: SupertonicEngine implementation
- commands/tts.ts: add --engine and --steps flags; route voices/langs per engine

44.1 kHz Supertonic output is compatible with the producer audio mixer, which
resamples all inputs to 48 kHz.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add engine-selection guidance to the media skill now that two TTS engines
exist. Rule of thumb: English or Chinese -> Kokoro; everything else ->
Supertonic.

- Kokoro is the only engine with Chinese (zh); Supertonic has no Chinese.
- Supertonic covers 31 languages and needs no Python/espeak-ng, so it is the
  preferred path for Korean, German, Russian, Arabic, and the other non-English
  languages Kokoro cannot synthesize.
- Document --engine/--steps usage, Supertonic voices (F1-F5, M1-M5), the full
  31-code --lang list, and per-engine requirements.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow 303/307/308 in addition to 301/302, resolve relative Location headers
against the request URL, and cap redirects at 10 to avoid infinite loops.
Improves reliability of model downloads that bounce through CDN redirects
(e.g. Hugging Face for the Supertonic TTS assets).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant