I tried Quantizing the Delay model, it Works with only 9GB vram (Model + tokenizer)

I tried quantizing the model (Quantized Qwen3 Backbone to GPTQ, and the tokenizer to FP16)

https://huggingface.co/blazingbhavneek/MOSS-TTS-GPTQ

https://huggingface.co/blazingbhavneek/MOSS-Audio-Tokenizer-FP16

My repo with quantization scripts (Credit to Claude too :)

https://github.com/blazingbhavneek/MOSS-TTS

@gaoyang07