通过神经结构学习的知识转移，以识别设备的语音情感识别

论文标题

通过神经结构学习的知识转移，以识别设备的语音情感识别

Knowledge Transfer For On-Device Speech Emotion Recognition with Neural Structured Learning

论文作者

Chang, Yi, Ren, Zhao, Nguyen, Thanh Tam, Qian, Kun, Schuller, Björn W.

论文摘要

语音情绪识别（SER）一直是人类计算机互动（HCI）的流行研究主题。随着边缘设备迅速升高，将SER应用于边缘设备有望用于大量HCI应用程序。尽管已经研究了深度学习以通过训练复杂模型来提高SER的性能，但边缘设备的记忆空间和计算能力代表了嵌入深度学习模型的限制。我们通过构建合成图提出了一个神经结构学习（NSL）框架。在源数据集上对SER模型进行了训练，并用于在目标数据集上构建图形。然后，将相对轻巧的模型用语音样本和图形作为输入进行训练。我们的实验表明，使用语音样本和图形在目标数据集上训练轻巧的SER模型不仅可以产生小的SER模型，而且还可以增强模型性能，而与仅具有语音样本的模型以及使用经典转移学习策略的模型相比。

Speech emotion recognition (SER) has been a popular research topic in human-computer interaction (HCI). As edge devices are rapidly springing up, applying SER to edge devices is promising for a huge number of HCI applications. Although deep learning has been investigated to improve the performance of SER by training complex models, the memory space and computational capability of edge devices represents a constraint for embedding deep learning models. We propose a neural structured learning (NSL) framework through building synthesized graphs. An SER model is trained on a source dataset and used to build graphs on a target dataset. A relatively lightweight model is then trained with the speech samples and graphs together as the input. Our experiments demonstrate that training a lightweight SER model on the target dataset with speech samples and graphs can not only produce small SER models, but also enhance the model performance compared to models with speech samples only and those using classic transfer learning strategies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题