This Python script (vocaltoust.py) generates a UST (UTAU Sequence Text) file from a vocal audio file (wav/mp3), suitable for use with DiffSinger in OpenUTAU. It transcribes the audio using Whisper, optionally aligns lyrics with Gentle, extracts pitch using Librosa, detects voiced regions, and creates a UST file with rests, notes, velocities, and flags tuned for DiffSinger.
- Whisper Transcription: Uses OpenAI Whisper for audio transcription with word-level timestamps.
- Optional Gentle Alignment: If lyrics are provided (local file or Genius URL), uses Gentle for forced alignment; falls back to Whisper segments.
- Pitch Extraction: Uses Librosa pyin for pitch detection with NaN interpolation and smoothing.
- Voice Activity Detection: Filters micro rests and detects voiced regions using RMS-based VAD.
- UST Generation: Creates UST files with rests, notes, velocities, and DiffSinger-optimized flags.
- Duration Matching: Ensures the UST is at least as long as the audio by extending the last note.
- Flexible Input: Supports local lyrics files or Genius URLs for lyrics.
- Python 3.7+
- Libraries:
- openai-whisper
- librosa
- numpy
- pandas
- gentle (optional, for forced alignment)
- requests
- beautifulsoup4
First, create a new Python -m venv venv Then activate it venv\Scripts\activate
Install the required libraries using pip:
pip install openai-whisper librosa numpy pandas gentle requests beautifulsoup4Run the script from the command line with the following options:
python vocaltoust.py --audio <audio_file> [--lyrics <lyrics_source>] [--no-gentle] [--model <whisper_model>] [--out <output_ust>]--audio <audio_file>: Path to the vocal audio file (wav/mp3). Required.--lyrics <lyrics_source>: Path to a local lyrics.txt file or a Genius URL for forced alignment. Optional.--no-gentle: Disable Gentle forced alignment and use Whisper only. Optional.--model <whisper_model>: Whisper model name (tiny, base, small, medium, large). Default: base. Optional.--out <output_ust>: Output UST file path. Defaults to<audio_file>.ust. Optional.
-
Basic usage with audio file:
python vocaltoust.py --audio vocals.wav
-
With local lyrics:
python vocaltoust.py --audio vocals.wav --lyrics lyrics.txt
-
With Genius URL:
python vocaltoust.py --audio vocals.wav --lyrics "https://genius.com/Artist-Song-lyrics" -
Custom output and model:
python vocaltoust.py --audio vocals.wav --lyrics lyrics.txt --model small --out output.ust
-
Disable Gentle:
python vocaltoust.py --audio vocals.wav --lyrics lyrics.txt --no-gentle
-i also recommend keeping each conversion to 30-secs or around a minutes for a song as any longer and it will become desynced (and make sure to not cut in middle of a word if you do so since whisper does not like that)
- The script generates a
.ustfile (default:<audio_file>.ust) compatible with OpenUTAU and DiffSinger. - Open the UST in OpenUTAU to render with DiffSinger.
- Transcription: Transcribes audio using Whisper with word timestamps.
- Alignment: If lyrics provided, attempts Gentle forced alignment; otherwise uses Whisper segments.
- Filtering: Applies VAD to filter segments to voiced regions.
- Pitch Extraction: Extracts pitch using Librosa pyin with smoothing.
- UST Creation: Generates UST with notes, rests, pitch, and flags.
- Validation: Scales and extends UST to match audio duration.
- Audio File Not Found: Ensure the path to
--audiois correct and the file exists. - Library Errors: Install all required libraries.
- Gentle Alignment Fails: Falls back to Whisper; check Gentle installation.
- UST Issues: Verify in OpenUTAU; ensure audio is vocal-only.
- Path Issues: Use absolute paths if relative paths fail.
This script is provided as-is for educational and personal use Editing and forking for personal use is allowed
to run expermental do: python vocaltoust.py --audio "C:\Users...\Downloads\More_ethical_singing_for_Ai_Vtubers--Lyricgen\More_ethical_singing_for_Ai_Vtubers--Lyricgen\teststuff\vocals.wav" --lyrics "C:\Users...\Downloads\More_ethical_singing_for_Ai_Vtubers--Lyricgen\More_ethical_singing_for_Ai_Vtubers--Lyricgen\teststuff\lyrics.txt" --output "output.ust"
you may have to change ... to the user path to Something like C:\Users\YOUR USER NAME\Downloads\More_ethical_singing_for_Ai_Vtubers--Openutau\teststuff proby you Just would need to replace ... With your actual personal computer username like C:\Users\musiclover1092\Downloads\More_ethical_singing_for_Ai_Vtubers--Openutau\teststuff proby