Vocal to UST Generator

This Python script (vocaltoust.py) generates a UST (UTAU Sequence Text) file from a vocal audio file (wav/mp3), suitable for use with DiffSinger in OpenUTAU. It transcribes the audio using Whisper, optionally aligns lyrics with Gentle, extracts pitch using Librosa, detects voiced regions, and creates a UST file with rests, notes, velocities, and flags tuned for DiffSinger.

Features

Whisper Transcription: Uses OpenAI Whisper for audio transcription with word-level timestamps.
Optional Gentle Alignment: If lyrics are provided (local file or Genius URL), uses Gentle for forced alignment; falls back to Whisper segments.
Pitch Extraction: Uses Librosa pyin for pitch detection with NaN interpolation and smoothing.
Voice Activity Detection: Filters micro rests and detects voiced regions using RMS-based VAD.
UST Generation: Creates UST files with rests, notes, velocities, and DiffSinger-optimized flags.
Duration Matching: Ensures the UST is at least as long as the audio by extending the last note.
Flexible Input: Supports local lyrics files or Genius URLs for lyrics.

Requirements

Python 3.7+
Libraries:
- openai-whisper
- librosa
- numpy
- pandas
- gentle (optional, for forced alignment)
- requests
- beautifulsoup4

First, create a new Python -m venv venv Then activate it venv\Scripts\activate

Install the required libraries using pip:

pip install openai-whisper librosa numpy pandas gentle requests beautifulsoup4

Usage

Run the script from the command line with the following options:

python vocaltoust.py --audio <audio_file> [--lyrics <lyrics_source>] [--no-gentle] [--model <whisper_model>] [--out <output_ust>]

Options

--audio <audio_file>: Path to the vocal audio file (wav/mp3). Required.
--lyrics <lyrics_source>: Path to a local lyrics.txt file or a Genius URL for forced alignment. Optional.
--no-gentle: Disable Gentle forced alignment and use Whisper only. Optional.
--model <whisper_model>: Whisper model name (tiny, base, small, medium, large). Default: base. Optional.
--out <output_ust>: Output UST file path. Defaults to <audio_file>.ust. Optional.

Examples

Basic usage with audio file:
```
python vocaltoust.py --audio vocals.wav
```

With local lyrics:

python vocaltoust.py --audio vocals.wav --lyrics lyrics.txt

With Genius URL:

python vocaltoust.py --audio vocals.wav --lyrics "https://genius.com/Artist-Song-lyrics"

Custom output and model:

python vocaltoust.py --audio vocals.wav --lyrics lyrics.txt --model small --out output.ust

Disable Gentle:

python vocaltoust.py --audio vocals.wav --lyrics lyrics.txt --no-gentle

-i also recommend keeping each conversion to 30-secs or around a minutes for a song as any longer and it will become desynced (and make sure to not cut in middle of a word if you do so since whisper does not like that)

Output

The script generates a .ust file (default: <audio_file>.ust) compatible with OpenUTAU and DiffSinger.
Open the UST in OpenUTAU to render with DiffSinger.

How It Works

Transcription: Transcribes audio using Whisper with word timestamps.
Alignment: If lyrics provided, attempts Gentle forced alignment; otherwise uses Whisper segments.
Filtering: Applies VAD to filter segments to voiced regions.
Pitch Extraction: Extracts pitch using Librosa pyin with smoothing.
UST Creation: Generates UST with notes, rests, pitch, and flags.
Validation: Scales and extends UST to match audio duration.

Troubleshooting

Audio File Not Found: Ensure the path to --audio is correct and the file exists.
Library Errors: Install all required libraries.
Gentle Alignment Fails: Falls back to Whisper; check Gentle installation.
UST Issues: Verify in OpenUTAU; ensure audio is vocal-only.
Path Issues: Use absolute paths if relative paths fail.

License

This script is provided as-is for educational and personal use Editing and forking for personal use is allowed

to run expermental do: python vocaltoust.py --audio "C:\Users...\Downloads\More_ethical_singing_for_Ai_Vtubers--Lyricgen\More_ethical_singing_for_Ai_Vtubers--Lyricgen\teststuff\vocals.wav" --lyrics "C:\Users...\Downloads\More_ethical_singing_for_Ai_Vtubers--Lyricgen\More_ethical_singing_for_Ai_Vtubers--Lyricgen\teststuff\lyrics.txt" --output "output.ust"

you may have to change ... to the user path to Something like C:\Users\YOUR USER NAME\Downloads\More_ethical_singing_for_Ai_Vtubers--Openutau\teststuff proby you Just would need to replace ... With your actual personal computer username like C:\Users\musiclover1092\Downloads\More_ethical_singing_for_Ai_Vtubers--Openutau\teststuff proby

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
teststuff		teststuff
.gitattributes		.gitattributes
README.md		README.md
diffsingerGenbasedoffvocals.py		diffsingerGenbasedoffvocals.py
lyrics.txt		lyrics.txt
vocals.ust		vocals.ust
vocaltoust - Copy (2).py		vocaltoust - Copy (2).py
vocaltoust - Copy.py		vocaltoust - Copy.py
vocaltoust.py		vocaltoust.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vocal to UST Generator

Features

Requirements

Usage

Options

Examples

Output

How It Works

Troubleshooting

License

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vocal to UST Generator

Features

Requirements

Usage

Options

Examples

Output

How It Works

Troubleshooting

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages