基于口语标识的最佳运输的无监督神经适应模型

论文标题

基于口语标识的最佳运输的无监督神经适应模型

Unsupervised neural adaptation model based on optimal transport for spoken language identification

论文作者

Lu, Xugang, Shen, Peng, Tsao, Yu, Kawai, Hisashi

论文摘要

由于训练和测试集之间的声音语音的统计分布不匹配，因此口语识别（SLID）的性能可能会大大退化。在本文中，我们提出了一个无监督的神经适应模型，以解决SLID的分布不匹配问题。在我们的模型中，我们明确地制定了适应性，以减少功能和分类器上的分布差异，用于培训和测试数据集。此外，灵感来自最佳运输（OT）以测量分布差异的强大力量，在适应性损失中设计了一个Wasserstein距离度量标准。通过将训练数据集的分类损失最小化，并在训练和测试数据集上的适应性损失中，培训和测试域之间的统计分配差异可以减少。我们对东方语言识别（OLR）挑战数据语料库进行了滑行实验，其中训练和测试数据集是从不同条件中收集的。我们的结果表明，在跨域测试任务上取得了重大改进。

Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded. In this paper, we propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID. In our model, we explicitly formulate the adaptation as to reduce the distribution discrepancy on both feature and classifier for training and testing data sets. Moreover, inspired by the strong power of the optimal transport (OT) to measure distribution discrepancy, a Wasserstein distance metric is designed in the adaptation loss. By minimizing the classification loss on the training data set with the adaptation loss on both training and testing data sets, the statistical distribution difference between training and testing domains is reduced. We carried out SLID experiments on the oriental language recognition (OLR) challenge data corpus where the training and testing data sets were collected from different conditions. Our results showed that significant improvements were achieved on the cross domain test tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题