Skip to content

feat/llama cpp inference#58

Merged
xiami2019 merged 3 commits intomainfrom
feat/llama_cpp_inference
Mar 5, 2026
Merged

feat/llama cpp inference#58
xiami2019 merged 3 commits intomainfrom
feat/llama_cpp_inference

Conversation

@cms42
Copy link
Collaborator

@cms42 cms42 commented Mar 4, 2026

  • llama.cpp inference support for MOSS-TTS-Delay-8B
  • ONNX/TRT inference support for MOSS Audio Tokenizer

@xiami2019
Copy link
Member

用Codex Review了一下hh

我看完这个 PR 的主要改动后,先给你结论:有 3 个我认为需要在合并前处理的问题。

[P1] 默认安装路径发生回归:torch/torchaudio/transformers 已从必选依赖移到 optional,但文档仍让用户 pip install -e .;这会让现有主流程脚本在运行时直接因缺包失败(例如 import torch / import transformers)。Refs: /tmp/moss-tts-pr58/pyproject.toml:13, /tmp/moss-tts-pr58/README.md:149, /tmp/moss-tts-pr58/clis/moss_tts_app.py:11
[P2] audio_backend=torch 缺少配置校验:audio_model_name_or_path 默认是空字符串,但 validate() 没检查该字段,最终在 _build_audio_tokenizer() 才会在 AutoModel.from_pretrained("") 处崩。建议前置 fail-fast。Refs: /tmp/moss-tts-pr58/moss_tts_delay/llama_cpp/pipeline.py:92, /tmp/moss-tts-pr58/moss_tts_delay/llama_cpp/pipeline.py:167, /tmp/moss-tts-pr58/moss_tts_delay/llama_cpp/pipeline.py:276
[P3] use_gpu_audio 对 torch 音频后端未生效:配置里有该开关,但只在 ONNX 分支使用;torch 分支没有根据该开关放到 GPU,容易出现“看起来支持 GPU,实际上跑在 CPU”的性能回退。Refs: /tmp/moss-tts-pr58/moss_tts_delay/llama_cpp/pipeline.py:103, /tmp/moss-tts-pr58/moss_tts_delay/llama_cpp/pipeline.py:261, /tmp/moss-tts-pr58/moss_tts_delay/llama_cpp/pipeline.py:274
补充说明:

我这次主要做了代码与配置审阅,没有跑端到端权重推理(本地无对应 GGUF/ONNX/TRT 权重环境)。
这个 PR 整体方向是好的(llama.cpp 后端架构完整),但上面 3 个点会直接影响可用性和安装体验。

…ntime' for PyTorch dependencies and enhance error handling in the audio backend configuration for the LlamaCppPipeline.
@cms42
Copy link
Collaborator Author

cms42 commented Mar 4, 2026

收到! [P1] 更新了中英文README,[P2] [P3] 按建议做了修改

之前一直focus在性能和功能正确性,确实没咋留意安装、文档和corner case相关的问题()我让Opus再Review一轮,重点关注这些方面🫡

端到端权重推理是可以跑的,我们用大规模任务/多样输入测过了,请Codex放心(x

@xiami2019 xiami2019 merged commit 1fdb374 into main Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants