I made this Woosh Portable 1 click install for Windows that uses Nvidia GTX 10XX, 16XX, RTX Quadro, 20XX, 30XX, 40XX, 50XX GPU. Creates Launch Woosh & Video's, Sounds Desktop Shortcuts.
Click here to jump to Install π Installation π
This repository provides inference code and open weights for the sound effect generative models developed at Sony AI. The current public release includes four models addressing the text-to-audio (T2A) and video-to- audio (V2A) tasks:
-
Audio encoder/decoder (Woosh-AE): High-quality latent encoder/decoder providing latents for generative modeling and decoding audio from generated latents.
-
Text conditioning (Woosh-CLAP): Multimodal text-audio alignment model providing token la- tents for diffusion model conditioning.
-
T2A Generation (Woosh-Flow and Woosh-DFlow): Original and distilled LDMs generating au- dio unconditionally or from given a text prompt.
-
V2A Generation (Woosh-VFlow): Multimodal LDM generating audio from a video sequence with optional text prompts.
GTX 10XX-RTX 30XX will have torch 2.6.0+cu126 installed for compatibility. RTX 40XX & 50XX will have torch 2.8.0+cu128.
-
Make sure you have Git installed as it will be needed to update Woosh, if not download the Git Standalone Installer and click on Git for Windows/x64 Setup. π Git Standalone Installer Download π To install Git, double click Git.exe and just keep clicking next until it's installed, you don't need to change anything.
-
Make sure your Nvidia graphics drivers are up-to-date. If they are not or if your not sure, please click on the following link to download Nvidia graphics drivers. π Nvidia Drivers π
-
Make sure that you have NVIDIA's CUDA Toolkit version 12.8 (or newer) installed on your system.
-
Make sure you have FFMPEG Shared downloaded & on PATH. Download π ffmpeg-release-full-shared.7z π
-
Now after you have made sure Nvidia GPU drivers are up to date and Git is installed, download Woosh Windows Nvidia 1 Click Install from here π Woosh Windows Nvidia 1 Click Install π or from the Releases section at the top right of this page.
-
After downloading, extract Woosh Windows Nvidia 1 Click Install ZIP file and pick where you would like to extract the zip files too.
-
Then open Woosh Windows Nvidia 1 Click Install main folder, you will see this in the root
-
Then depending on your Nvidia GPU double click on either the Install_Woosh_GTX_10XX_to_RTX_30XX.bat or Install_Woosh_RTX_40XX_&_50XX.bat to start the installation. It will install everything and download the models via Huggingface. After installation is finished, slowly scroll back up to the top to make sure everything installed correctly.
-
To launch Woosh double click the Launch_Woosh.bat.
If you have problems after a successful installation, please go to the Official Sony Ai - Woosh Github to report problems. Sony Ai - Woosh Thank you.
Woosh models can be served via our API server. Check the API folder for usage details.
For details about model architecture, training and evaluation, please check our tech report available on arxiv.org.
@misc{hadjeres2026,
title={Woosh: A Sound Effects Foundation Model},
author={Gaetan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrichi, Hakim Missoum, Joan SerrΓ and Yuki Mitsufuji},
year={2026},
eprint={2604.01929},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2604.01929},
}Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
- The majority of the code in this repository is released under the MIT license. The video-to-audio
Woosh-VFlowandWoosh-DVFlowmodels use adapted code from MM-AUDIO and MotionFormer. The code for these models is made available under Apache v2 license terms. - The open weights in the releases page are released under the CC-BY-NC license.
- The test audio and video samples in the releases page contain their individual license terms in the corresponding download file.