使用贝叶斯神经网络和标签分布学习，在语音情绪识别中端到端标签的不确定性建模

论文标题

使用贝叶斯神经网络和标签分布学习，在语音情绪识别中端到端标签的不确定性建模

End-to-End Label Uncertainty Modeling in Speech Emotion Recognition using Bayesian Neural Networks and Label Distribution Learning

论文作者

Prabhu, Navin Raj, Lehmann-Willenbrock, Nale, Gerkman, Timo

论文摘要

为了训练机器学习算法以预测唤醒和价值的情绪表达，需要注释的数据集。但是，由于不同的人对他人的情感表达有所不同，因此他们的注释是主观的。为此，注释通常是从多个注释者中收集的，并取平均值以获得地面真相标签。但是，当仅在平均地面真相上接受训练时，该模型对情感表达中固有的主观性不可知。因此，在这项工作中，我们提出了一个端到端的贝叶斯神经网络，能够接受注释分布的培训，以捕获基于主观性的标签不确定性。我们不是高斯，而是使用Student's T-Distripution对注释分布进行建模，这也解释了可用的注释数量。我们得出相应的kullback-leibler差异损失，并使用它来训练估计器的注释分布，从中可以推断出平均值和不确定性。我们使用两个野外数据集验证了提出的方法。我们表明，拟议的基于T分布的方法实现了最新的不确定性建模导致语音情绪识别，并且在跨企业评估中的一致结果一致。此外，分析表明，随着通道间的相关性的增加和可用的注释数量减少，T分布的优势比高斯的增长而增长。

To train machine learning algorithms to predict emotional expressions in terms of arousal and valence, annotated datasets are needed. However, as different people perceive others' emotional expressions differently, their annotations are subjective. To account for this, annotations are typically collected from multiple annotators and averaged to obtain ground-truth labels. However, when exclusively trained on this averaged ground-truth, the model is agnostic to the inherent subjectivity in emotional expressions. In this work, we therefore propose an end-to-end Bayesian neural network capable of being trained on a distribution of annotations to also capture the subjectivity-based label uncertainty. Instead of a Gaussian, we model the annotation distribution using Student's t-distribution, which also accounts for the number of annotations available. We derive the corresponding Kullback-Leibler divergence loss and use it to train an estimator for the annotation distribution, from which the mean and uncertainty can be inferred. We validate the proposed method using two in-the-wild datasets. We show that the proposed t-distribution based approach achieves state-of-the-art uncertainty modeling results in speech emotion recognition, and also consistent results in cross-corpora evaluations. Furthermore, analyses reveal that the advantage of a t-distribution over a Gaussian grows with increasing inter-annotator correlation and a decreasing number of annotations available.

下载PDF全文

下载文献需遵守相关版权规定

论文标题