自动语音识别框架中声学模型参数的量化

论文标题

自动语音识别框架中声学模型参数的量化

Quantization of Acoustic Model Parameters in Automatic Speech Recognition Framework

论文作者

Prasad, Amrutha, Motlicek, Petr, Madikeri, Srikanth

论文摘要

最先进的混合动力自动语音识别（ASR）系统利用了深层神经网络（DNN）的声学模型（AM），该模型（AM）接受了晶格自由最大最大共同信息（LF-MMI）标准和N-Gram语言模型。 AMS通常具有数百万个参数，并且需要大量的参数降低才能在嵌入式设备上操作。本文研究了参数量化对总体单词识别性能的影响。 Following approaches are presented: (i) AM trained in Kaldi framework with conventional factorized TDNN (TDNN-F) architecture, (ii) the TDNN AM built in Kaldi loaded into the PyTorch toolkit using a C++ wrapper for post-training quantization, (iii) quantization-aware training in PyTorch for Kaldi TDNN model, (iv) quantization-aware training in卡尔迪。在标准的LibrisPeech设置上获得的结果提供了识别精度W.R.T.的有趣概述。应用量化方案。

State-of-the-art hybrid automatic speech recognition (ASR) system exploits deep neural network (DNN) based acoustic models (AM) trained with Lattice Free-Maximum Mutual Information (LF-MMI) criterion and n-gram language models. The AMs typically have millions of parameters and require significant parameter reduction to operate on embedded devices. The impact of parameter quantization on the overall word recognition performance is studied in this paper. Following approaches are presented: (i) AM trained in Kaldi framework with conventional factorized TDNN (TDNN-F) architecture, (ii) the TDNN AM built in Kaldi loaded into the PyTorch toolkit using a C++ wrapper for post-training quantization, (iii) quantization-aware training in PyTorch for Kaldi TDNN model, (iv) quantization-aware training in Kaldi. Results obtained on standard Librispeech setup provide an interesting overview of recognition accuracy w.r.t. applied quantization scheme.

下载PDF全文

下载文献需遵守相关版权规定

论文标题