透明的嵌入式嵌入式插入的学习

论文标题

透明的嵌入式嵌入式插入的学习

Spectral Clustering-aware Learning of Embeddings for Speaker Diarisation

论文作者

Lee, Evonne P. C., Sun, Guangzhi, Zhang, Chao, Woodland, Philip C.

论文摘要

在说话者诊断中，嵌入提取模型的扬声器通常会遭受其训练损失功能与扬声器聚类方法之间的不匹配。在本文中，我们提出了嵌入（比例）的光谱聚类学习方法以解决不匹配的方法。具体而言，除了角度原型CAL（AP）损耗外，Scale还使用了新型的亲和力矩阵损失，该损失直接最小化了从说话者嵌入者估计的亲和力矩阵和参考文献之间的误差。量表还包括Pepriper阈值和高斯模糊，作为两个重要的超参数用于训练中的光谱聚类。 AMI数据集上的实验表明，使用Oracle分割，以比例获得超过50％的相对扬声器错误率的扬声器嵌入，并且与基于AP-loss的扬声器的强基线相比，使用自动分割的相比，使用自动分段的相对腹泻错误率降低了30％。

In speaker diarisation, speaker embedding extraction models often suffer from the mismatch between their training loss functions and the speaker clustering method. In this paper, we propose the method of spectral clustering-aware learning of embeddings (SCALE) to address the mismatch. Specifically, besides an angular prototype cal (AP) loss, SCALE uses a novel affinity matrix loss which directly minimises the error between the affinity matrix estimated from speaker embeddings and the reference. SCALE also includes p-percentile thresholding and Gaussian blur as two important hyper-parameters for spectral clustering in training. Experiments on the AMI dataset showed that speaker embeddings obtained with SCALE achieved over 50% relative speaker error rate reductions using oracle segmentation, and over 30% relative diarisation error rate reductions using automatic segmentation when compared to a strong baseline with the AP-loss-based speaker embeddings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题