Ansteaker：嵌入学习工具包的研究和生产的扬声器

论文标题

Ansteaker：嵌入学习工具包的研究和生产的扬声器

Wespeaker: A Research and Production oriented Speaker Embedding Learning Toolkit

论文作者

Wang, Hongji, Liang, Chengdong, Wang, Shuai, Chen, Zhengyang, Zhang, Binbin, Xiang, Xu, Deng, Yanlei, Qian, Yanmin

论文摘要

演讲者的建模对于许多相关任务，例如说话者的识别和说话者诊断至关重要。主要的建模方法是固定维矢量表示，即说话者的嵌入。本文介绍了一个嵌入学习工具包的研究和生产的扬声器。 AnveAker包含可扩展数据管理的实施，最先进的说话者嵌入模型，损失功能和后端得分，并通过在几种演讲者验证挑战中在获胜系统中采用的结构化食谱获得了高度竞争的结果。相关食谱还展示了针对其他下游任务（例如说话者诊断）的应用。此外，将CPU和与GPU兼容的部署代码集成了以生产为导向的开发。该工具包可在https://github.com/wenet-e2e/wespeaker上公开获得。

Speaker modeling is essential for many related tasks, such as speaker recognition and speaker diarization. The dominant modeling approach is fixed-dimensional vector representation, i.e., speaker embedding. This paper introduces a research and production oriented speaker embedding learning toolkit, Wespeaker. Wespeaker contains the implementation of scalable data management, state-of-the-art speaker embedding models, loss functions, and scoring back-ends, with highly competitive results achieved by structured recipes which were adopted in the winning systems in several speaker verification challenges. The application to other downstream tasks such as speaker diarization is also exhibited in the related recipe. Moreover, CPU- and GPU-compatible deployment codes are integrated for production-oriented development. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.

下载PDF全文

下载文献需遵守相关版权规定

论文标题