Skip to content

Releases: HAKORADev/VODER

Major Update v2 - SE Mode, SFX, Script Directives

07 Apr 18:36

Choose a tag to compare

04/08/2026

  • Status: Stable, all features work, still developing
  • Major Update — Two New Modes, Script Directives, SFX Integration, and Speech Enhancement

Added

  • SE (Speech Enhancement) Mode - Audio quality improvement using UniSE model
  • SFX (Sound Effects Generation) Mode - Custom sound effects from text using TangoFlux
  • Script Directives - Per-line control over timing, volume, and duration
  • SFX Character in Dialogue - Embed sound effects directly in dialogue scripts
  • Enhanced Dialogue Assembly - Complete rewrite of dialogue generation pipeline
  • Cross-use Feature - Mix generated and cloned voices in the same dialogue
  • Music Volume Level Control - Fine-grained control over background music volume

Changed

  • Expanded Processing Modes - VODER now has 9 processing modes (up from 7)

Enhancement - Music Chunking and Duration Fix

07 Apr 18:36

Choose a tag to compare

04/03/2026

  • Status: Stable, all features work, still developing

Added

  • Background Music Chunking for Long Dialogues - Generate multiple music chunks for dialogues longer than 250 seconds

Fixed

  • Music Generation Minimum Duration - Enforced 10-second minimum for music generation to match ACE-Step model requirements

Major Update v1 - STT, Diarization, OCR, YouTube

07 Apr 18:36

Choose a tag to compare

04/02/2026

  • Status: Stable, all features work, still developing

Added

  • STT (Speech-to-Text) Standalone Mode - Transcribe audio, video, image, or YouTube URL to text
  • Speaker Diarization (Pyannote Integration) - Identify and separate speakers in audio
  • Image Text Extraction (EasyOCR) - Extract text from images
  • YouTube and Video Platform Download (yt-dlp) - Download audio from YouTube, Bilibili, TikTok
  • Automatic Voice Clip Extraction - Extract individual speaker voice clips automatically

Changed

  • Centralized Model Management System - Complete overhaul of model storage and caching

Major Update - MSTS and Memory Offloading

07 Apr 18:36
dc65859

Choose a tag to compare

02/24/2026

  • Status: Stable, all features work, still developing

Added

  • MSTS (Music-STS) in STS mode - STS now supports musical inputs via the Seed-VC v1 model
  • TTS+VC dialogue voice cloning stability - Voice characteristics extracted once per character

Optimized

  • Memory offloading after processing - Models explicitly unloaded from memory/VRAM

Major Update - Dialogue Support

07 Apr 18:36
829bcea

Choose a tag to compare

02/12/2026

  • Status: Stable, all features work, under aggressive testing, still developing

Added

  • Full dialogue support in CLI - Both interactive and one-liner modes now support multi-speaker scripts
  • Optional background music for dialogue scripts - Available in TTS and TTS+VC modes
  • Row-based dialogue editor in GUI - Replaced free-text script box with per-row Character/Dialogue fields

Fixed

  • Memory optimisation for TTM+VC

Seed-VC v2 Fix

07 Apr 18:36
644f0fd

Choose a tag to compare

02/10/2026

  • Status: Stable, all features work, under aggressive testing, still developing

Fixed

  • Seed-VC v2 unmatched tensor error which caused both STS and TTM+VC to fail. Now STS works perfectly; TTM+VC will receive further optimisations.

Initial Release - Unstable Development Build

07 Apr 18:36
0ae9945

Choose a tag to compare

02/09/2026

  • Status: unstable, untested, under development

Initial Release - Unstable Development Build

First public release of VODER.