利用CNN中的体重冗余：超越修剪和量化

论文标题

利用CNN中的体重冗余：超越修剪和量化

Exploiting Weight Redundancy in CNNs: Beyond Pruning and Quantization

论文作者

Wen, Yuan, Gregg, David

论文摘要

修剪和量化是改善卷积神经网络（CNN）的性能和存储效率的经过验证的方法。修剪会消除张量的接近零重量，并且掩盖相邻层中神经元之间的弱连接。量化通过用数值相似的值替换需要更少存储的值来降低权重的精度。在本文中，我们以相似值的重复模式的形式确定了CNN重量张量中的另一种冗余形式。我们观察到，修剪和量化都大大增加了重量张量中重复模式的数量。我们研究了几种压缩方案，以利用CNN重量数据中的这种结构，包括多种形式的Huffman编码以及其他受块稀疏基质格式启发的方法。我们在几个知名的CNN上评估了我们的方法，发现除了节省了修剪和量化外，我们还可以达到1.4倍至3.1倍的压实比。

Pruning and quantization are proven methods for improving the performance and storage efficiency of convolutional neural networks (CNNs). Pruning removes near-zero weights in tensors and masks weak connections between neurons in neighbouring layers. Quantization reduces the precision of weights by replacing them with numerically similar values that require less storage. In this paper, we identify another form of redundancy in CNN weight tensors, in the form of repeated patterns of similar values. We observe that pruning and quantization both tend to drastically increase the number of repeated patterns in the weight tensors. We investigate several compression schemes to take advantage of this structure in CNN weight data, including multiple forms of Huffman coding, and other approaches inspired by block sparse matrix formats. We evaluate our approach on several well-known CNNs and find that we can achieve compaction ratios of 1.4x to 3.1x in addition to the saving from pruning and quantization.

下载PDF全文

下载文献需遵守相关版权规定

论文标题