论文标题

SG-VAD:基于随机门的语音活动检测

SG-VAD: Stochastic Gates Based Speech Activity Detection

论文作者

Svirsky, Jonathan, Lindenbaum, Ofir

论文摘要

我们在低资源环境中提出了一种新型的语音活动检测(VAD)模型。我们的关键想法是将VAD建模为一项剥夺任务,并构建旨在确定语音分类任务的滋扰功能的网络。我们训练模型,同时识别无关紧要的特征,同时预测语音事件的类型。我们的模型仅包含7.8K参数,在AVA语音评估集上优于先前提出的方法,并在HAVIC数据集中提供了比较结果。我们介绍了有关模型组件的架构,实验结果和消融研究。我们在此处发布代码和模型https://www.github.com/jsvir/vad。

We propose a novel voice activity detection (VAD) model in a low-resource environment. Our key idea is to model VAD as a denoising task, and construct a network that is designed to identify nuisance features for a speech classification task. We train the model to simultaneously identify irrelevant features while predicting the type of speech event. Our model contains only 7.8K parameters, outperforms the previously proposed methods on the AVA-Speech evaluation set, and provides comparative results on the HAVIC dataset. We present its architecture, experimental results, and ablation study on the model's components. We publish the code and the models here https://www.github.com/jsvir/vad.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源