波形域中的实时语音增强

论文标题

波形域中的实时语音增强

Real Time Speech Enhancement in the Waveform Domain

论文作者

Defossez, Alexandre, Synnaeve, Gabriel, Adi, Yossi

论文摘要

我们提出了一个在笔记本电脑CPU上实时运行的原始波形的因果语音增强模型。所提出的模型基于具有跳过连接的编码器架构。它使用多个损失函数在时间域和频域中都进行了优化。经验证据表明，它能够消除各种背景噪音，包括固定和非平稳噪声以及房间混响。此外，我们建议将一组直接应用于原始波形上应用的数据增强技术，从而进一步提高模型性能及其概括能力。我们使用客观指标和人类判断的几个标准基准进行评估。所提出的模型在直接在原始波形上工作时，与因果和非因果方法的最新性能相匹配。

We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities. We perform evaluations on several standard benchmarks, both using objective metrics and human judgements. The proposed model matches state-of-the-art performance of both causal and non causal methods while working directly on the raw waveform.

下载PDF全文

下载文献需遵守相关版权规定

论文标题