迈向绿色ASR：300小时台词库中混合TDNN系统的无损4位量化

论文标题

迈向绿色ASR：300小时台词库中混合TDNN系统的无损4位量化

Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus

论文作者

Xu, Junhao, Hu, Shoukang, Liu, Xunying, Meng, Helen

论文摘要

最先进的时间自动语音识别（ASR）系统变得越来越复杂且对于实际应用而变得昂贵。本文介绍了基于300 hr台词板上的基于时间延迟神经网络（TDNNS）ASR系统的高性能和低英尺4位量化的LF-MMI训练的时间延迟神经网络（TDNNS）。整体系统设计的一个关键特征是说明不同模型组件对量化错误的细粒度，不同的性能敏感性。为此，使用了一组神经体系结构压缩和混合精度量化方法来促进最佳分解的TDNN重量矩阵子空间维度和量化位宽度的隐藏层级别自动配置。所提出的技术还用于生成2位混合精度量化的变压器语言模型。在总机数据上进行的实验表明，提出的神经体系结构压缩和混合精度量化技术始终优于单词错误率（WER）的均匀精度量化基线系统。在基线完整的精确系统中获得了总体“无损”压缩率，其中包括TDNN和变压器组件，同时没有产生统计学意义上的显着增加。

State of the art time automatic speech recognition (ASR) systems are becoming increasingly complex and expensive for practical applications. This paper presents the development of a high performance and low-footprint 4-bit quantized LF-MMI trained factored time delay neural networks (TDNNs) based ASR system on the 300-hr Switchboard corpus. A key feature of the overall system design is to account for the fine-grained, varying performance sensitivity at different model components to quantization errors. To this end, a set of neural architectural compression and mixed precision quantization approaches were used to facilitate hidden layer level auto-configuration of optimal factored TDNN weight matrix subspace dimensionality and quantization bit-widths. The proposed techniques were also used to produce 2-bit mixed precision quantized Transformer language models. Experiments conducted on the Switchboard data suggest that the proposed neural architectural compression and mixed precision quantization techniques consistently outperform the uniform precision quantised baseline systems of comparable bit-widths in terms of word error rate (WER). An overall "lossless" compression ratio of 13.6 was obtained over the baseline full precision system including both the TDNN and Transformer components while incurring no statistically significant WER increase.

下载PDF全文

下载文献需遵守相关版权规定

论文标题