Unofficial PyTorch implementation of VALL-E: zero-shot text-to-speech and voice cloning using neural codec language models. Train and synthesize speech from text with a single reference audio.
python machine-learning voice speech tts speech-synthesis transformer autoregressive nar zero-shot voice-synthesis deep-learnin voice-cloning vall-e text-to-text neural-codec pytorchmencodec
-
Updated
Jan 30, 2026 - Python