Skip to content
18 changes: 18 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,13 @@
# centos
sudo yum install sox sox-devel
```

### llama-cpp-python Backend (optional)
For CPU/low-VRAM inference using GGUF quantized models:
```sh
pip install llama-cpp-python
```
Download quantized GGUF model from [Ferraronp/CosyVoice3-qwen2.5-0.5b-speech-gguf](https://huggingface.co/Ferraronp/CosyVoice3-qwen2.5-0.5b-speech-gguf).

### Model download

Expand Down Expand Up @@ -151,6 +158,17 @@ Follow the code in `example.py` for detailed usage of each model.
python example.py
```

#### llama.cpp Usage
Replace the default model initialization with:
```python
cosyvoice = AutoModel(
model_dir='pretrained_models/Fun-CosyVoice3-0.5B',
load_llama_cpp=True,
gguf_model_path='/path/to/model.gguf'
)
```
All existing inference methods (`inference_zero_shot`, etc.) work unchanged.

#### vLLM Usage
CosyVoice2/3 now supports **vLLM 0.11.x+ (V1 engine)** and **vLLM 0.9.0 (legacy)**.
Older vllm version(<0.9.0) do not support CosyVoice inference, and versions in between (e.g., 0.10.x) are not tested.
Expand Down
Loading