空白崩溃：压缩CTC排放速度更快的解码

论文标题

空白崩溃：压缩CTC排放速度更快的解码

Blank Collapse: Compressing CTC emission for the faster decoding

论文作者

Jung, Minkyu, Kwon, Ohhyeok, Seo, Seunghyun, Seo, Soonshin

论文摘要

Connectionist时间分类（CTC）模型是建模序列的非常有效的方法，尤其是对于语音数据。为了将CTC模型用作自动语音识别（ASR）任务，必须使用外部语言模型（如N-gram LM）进行梁搜索解码以获得合理的结果。在本文中，我们深入分析了CTC梁搜索中的空白标签，并提出了一种非常简单的方法，以减少导致更快的光束搜索解码速度的计算量。使用这种方法，与普通梁搜索解码相比，我们可以达到高达78％的解码速度，而Librispeech数据集的精度损失很小。我们证明，这种方法不仅通过实验实际上是有效的，而且从理论上也是通过数学推理有效的。我们还观察到，如果模型的准确性更高，则这种降低更为明显。

Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an external language model like n-gram LM is necessary to obtain reasonable results. In this paper we analyze the blank label in CTC beam search deeply and propose a very simple method to reduce the amount of calculation resulting in faster beam search decoding speed. With this method, we can get up to 78% faster decoding speed than ordinary beam search decoding with a very small loss of accuracy in LibriSpeech datasets. We prove this method is effective not only practically by experiments but also theoretically by mathematical reasoning. We also observe that this reduction is more obvious if the accuracy of the model is higher.

下载PDF全文

下载文献需遵守相关版权规定

论文标题