使用基于双语扬声器数据的扬声器空间翻译生成多语言的声音

论文标题

使用基于双语扬声器数据的扬声器空间翻译生成多语言的声音

Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data

论文作者

Maiti, Soumi, Marchi, Erik, Conkie, Alistair

论文摘要

我们提出了双语文本到语音的进步，该语音能够转换单语音以说第二语言，同时保持扬声器的语音质量。我们证明，双语扬声器嵌入空间包含每种语言的单独分布，并且可以使用扬声器嵌入生成的扬声器空间的简单转换来控制一种语言中合成语音的口音。甚至可以将同样的转换应用于单语扬声器。在我们的实验中，使用了来自英语 - 西班牙语（墨西哥）双语演讲者的演讲者数据，其目标是使英语说话者会说西班牙语和西班牙语的人说英语。我们发现，简单的转换足以使声音从一种语言转换为一种自然性。在一种情况下，转换的声音在听力测试中的表现优于本地语言。实验进一步表明，转换保留了原始声音的许多特征。可以控制出现的重音程度，并且在一定范围的重音值中，自然性相对一致。

We present progress towards bilingual Text-to-Speech which is able to transform a monolingual voice to speak a second language while preserving speaker voice quality. We demonstrate that a bilingual speaker embedding space contains a separate distribution for each language and that a simple transform in speaker space generated by the speaker embedding can be used to control the degree of accent of a synthetic voice in a language. The same transform can be applied even to monolingual speakers. In our experiments speaker data from an English-Spanish (Mexican) bilingual speaker was used, and the goal was to enable English speakers to speak Spanish and Spanish speakers to speak English. We found that the simple transform was sufficient to convert a voice from one language to the other with a high degree of naturalness. In one case the transformed voice outperformed a native language voice in listening tests. Experiments further indicated that the transform preserved many of the characteristics of the original voice. The degree of accent present can be controlled and naturalness is relatively consistent across a range of accent values.

下载PDF全文

下载文献需遵守相关版权规定

论文标题