论文标题
唱歌语音分离的复杂比率掩盖
Complex ratio masking for singing voice separation
论文作者
论文摘要
音乐源分离对于卡拉OK和混音等应用很重要。以前的许多研究都集中在估计短时傅立叶变换(STFT)幅度和丢弃相位信息。我们观察到,对于唱歌的语音分离,相位可以在分离质量方面得到显着改善。本文提出了一种复杂的比例掩盖方法,用于语音和伴奏分离。该提出的方法采用了茂密的岩石,以估计每个声源的STFT的真实和虚构组成部分。引入了一种简单的合奏技术,以进一步提高分离性能。评估结果表明,所提出的方法优于分离语音和伴奏的最新模型。
Music source separation is important for applications such as karaoke and remixing. Much of previous research focuses on estimating short-time Fourier transform (STFT) magnitude and discarding phase information. We observe that, for singing voice separation, phase can make considerable improvement in separation quality. This paper proposes a complex ratio masking method for voice and accompaniment separation. The proposed method employs DenseUNet with self attention to estimate the real and imaginary components of STFT for each sound source. A simple ensemble technique is introduced to further improve separation performance. Evaluation results demonstrate that the proposed method outperforms recent state-of-the-art models for both separated voice and accompaniment.