2026/06/11/Speech-Dialog-Data-Synthesis-Quality-Gates/ #1
Replies: 1 comment
-
|
家人们,觉得好的评论下,给个赞 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
2026/06/11/Speech-Dialog-Data-Synthesis-Quality-Gates/
用大模型合成对话数据很容易,难的是让这批数据真正进入语音系统的训练和评测闭环。一个可用的语音对话样本,不只是几轮看起来顺畅的文本。它还要保留角色、话轮、时间、通道、实体槽位、ASR 噪声、口语现象和质量标记。 如果这些约束只写在 prompt 里,数据规模一大就会漂移:格式不稳定、标签不一致、口语化过度、实体边界模糊,最后训练出来的模型学到的不是交互能力,而是一批生成器的随机习惯。
https://alanfangblog.com/2026/06/11/Speech-Dialog-Data-Synthesis-Quality-Gates/
Beta Was this translation helpful? Give feedback.
All reactions