DON'T CLONE THIS REPO, IT WON'T WORK AS IT ALL DEPENDS ON THE PYTHON_EMBEDED 3.12.10 TO WORK!

I made this Woosh Portable 1 click install for Windows that uses Nvidia GTX 10XX, 16XX, RTX Quadro, 20XX, 30XX, 40XX, 50XX GPU. Creates Launch Woosh & Video's, Sounds Desktop Shortcuts.

Click here to jump to Install 👉 Installation 👈

Woosh - Sound Effect Generative Models

This repository provides inference code and open weights for the sound effect generative models developed at Sony AI. The current public release includes four models addressing the text-to-audio (T2A) and video-to- audio (V2A) tasks:

Audio encoder/decoder (Woosh-AE): High-quality latent encoder/decoder providing latents for generative modeling and decoding audio from generated latents.
Text conditioning (Woosh-CLAP): Multimodal text-audio alignment model providing token la- tents for diffusion model conditioning.
T2A Generation (Woosh-Flow and Woosh-DFlow): Original and distilled LDMs generating au- dio unconditionally or from given a text prompt.
V2A Generation (Woosh-VFlow): Multimodal LDM generating audio from a video sequence with optional text prompts.

📦 Installation

Nvidia GTX 10XX, 16XX, RTX Quadro, 20XX, 30XX, 40XX, 50XX

GTX 10XX-RTX 30XX will have torch 2.6.0+cu126 installed for compatibility. RTX 40XX & 50XX will have torch 2.8.0+cu128.

Text to audio requires 6GB VRAM.

Video with audio requires 8GB VRAM or more.

Make sure you have Git installed as it will be needed to update Woosh, if not download the Git Standalone Installer and click on Git for Windows/x64 Setup. 👉 Git Standalone Installer Download 👈 To install Git, double click Git.exe and just keep clicking next until it's installed, you don't need to change anything.
Make sure your Nvidia graphics drivers are up-to-date. If they are not or if your not sure, please click on the following link to download Nvidia graphics drivers. 👉 Nvidia Drivers 👈
Make sure that you have NVIDIA's CUDA Toolkit version 12.8 (or newer) installed on your system.
Make sure you have FFMPEG Shared downloaded & on PATH. Download 👉 ffmpeg-release-full-shared.7z 👈
Now after you have made sure Nvidia GPU drivers are up to date and Git is installed, download Woosh Windows Nvidia 1 Click Install from here 👉 Woosh Windows Nvidia 1 Click Install 👈 or from the Releases section at the top right of this page.
After downloading, extract Woosh Windows Nvidia 1 Click Install ZIP file and pick where you would like to extract the zip files too.
Then open Woosh Windows Nvidia 1 Click Install main folder, you will see this in the root

Then depending on your Nvidia GPU double click on either the Install_Woosh_GTX_10XX_to_RTX_30XX.bat or Install_Woosh_RTX_40XX_&_50XX.bat to start the installation. It will install everything and download the models via Huggingface. After installation is finished, slowly scroll back up to the top to make sure everything installed correctly.
To launch Woosh double click the Launch_Woosh.bat.

Troubleshooting

If you have problems after a successful installation, please go to the Official Sony Ai - Woosh Github to report problems. Sony Ai - Woosh Thank you.

If this worked for you, Please give it a Star ⭐. Thank you.

API server

Woosh models can be served via our API server. Check the API folder for usage details.

Citation

For details about model architecture, training and evaluation, please check our tech report available on arxiv.org.

@misc{hadjeres2026,
      title={Woosh: A Sound Effects Foundation Model},
      author={Gaetan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrichi, Hakim Missoum, Joan Serrà and Yuki Mitsufuji},
      year={2026},
      eprint={2604.01929},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2604.01929},
}

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

The majority of the code in this repository is released under the MIT license. The video-to-audio Woosh-VFlow and Woosh-DVFlow models use adapted code from MM-AUDIO and MotionFormer. The code for these models is made available under Apache v2 license terms.
The open weights in the releases page are released under the CC-BY-NC license.
The test audio and video samples in the releases page contain their individual license terms in the corresponding download file.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
api		api
checkpoints		checkpoints
reaper_script		reaper_script
woosh		woosh
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE.Apachev2		LICENSE.Apachev2
LICENSE.Freesound		LICENSE.Freesound
LICENSE.MIT		LICENSE.MIT
README.md		README.md
gradio_Woosh-DFlow.py		gradio_Woosh-DFlow.py
gradio_Woosh-Flow.py		gradio_Woosh-Flow.py
gradio_Woosh-VFlow.py		gradio_Woosh-VFlow.py
pyproject.toml		pyproject.toml
test_Woosh-AE.py		test_Woosh-AE.py
test_Woosh-CLAP.py		test_Woosh-CLAP.py
test_Woosh-DFlow.py		test_Woosh-DFlow.py
test_Woosh-DVFlow.py		test_Woosh-DVFlow.py
test_Woosh-Flow.py		test_Woosh-Flow.py
test_Woosh-VFlow.py		test_Woosh-VFlow.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DON'T CLONE THIS REPO, IT WON'T WORK AS IT ALL DEPENDS ON THE PYTHON_EMBEDED 3.12.10 TO WORK!

I made this Woosh Portable 1 click install for Windows that uses Nvidia GTX 10XX, 16XX, RTX Quadro, 20XX, 30XX, 40XX, 50XX GPU. Creates Launch Woosh & Video's, Sounds Desktop Shortcuts.

Click here to jump to Install 👉 Installation 👈

Woosh - Sound Effect Generative Models

📦 Installation

Nvidia GTX 10XX, 16XX, RTX Quadro, 20XX, 30XX, 40XX, 50XX

GTX 10XX-RTX 30XX will have torch 2.6.0+cu126 installed for compatibility. RTX 40XX & 50XX will have torch 2.8.0+cu128.

Text to audio requires 6GB VRAM.

Video with audio requires 8GB VRAM or more.

Troubleshooting

If this worked for you, Please give it a Star ⭐. Thank you.

API server

Citation

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DON'T CLONE THIS REPO, IT WON'T WORK AS IT ALL DEPENDS ON THE PYTHON_EMBEDED 3.12.10 TO WORK!

I made this Woosh Portable 1 click install for Windows that uses Nvidia GTX 10XX, 16XX, RTX Quadro, 20XX, 30XX, 40XX, 50XX GPU. Creates Launch Woosh & Video's, Sounds Desktop Shortcuts.

Click here to jump to Install 👉 Installation 👈

Woosh - Sound Effect Generative Models

📦 Installation

Nvidia GTX 10XX, 16XX, RTX Quadro, 20XX, 30XX, 40XX, 50XX

GTX 10XX-RTX 30XX will have torch 2.6.0+cu126 installed for compatibility. RTX 40XX & 50XX will have torch 2.8.0+cu128.

Text to audio requires 6GB VRAM.

Video with audio requires 8GB VRAM or more.

Troubleshooting

If this worked for you, Please give it a Star ⭐. Thank you.

API server

Citation

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages