论文标题
反馈的动态模型修剪
Dynamic Model Pruning with Feedback
论文作者
论文摘要
深神经网络通常具有数百万个参数。这可能会阻碍他们对低端设备的部署,这不仅是由于高内存需求,而且还因为推理时的延迟增加。我们提出了一种新型的模型压缩方法,该方法生成了一个稀疏的训练模型而无需其他开销:允许(i)(i)稀疏模式的动态分配,(ii)将反馈信号合并以重新激活过度修剪的权重,我们在一次训练中获得了表现的稀疏模型(不需要重新训练(不需要重新训练),但可以进一步提高性能)。我们在CIFAR-10和Imagenet上评估了我们的方法,并表明所获得的稀疏模型可以达到密集模型的最新性能。此外,它们的性能超过了所有先前提出的修剪方案产生的模型的性能。
Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes.