反馈的动态模型修剪

论文标题

反馈的动态模型修剪

Dynamic Model Pruning with Feedback

论文作者

Lin, Tao, Stich, Sebastian U., Barba, Luis, Dmitriev, Daniil, Jaggi, Martin

论文摘要

深神经网络通常具有数百万个参数。这可能会阻碍他们对低端设备的部署，这不仅是由于高内存需求，而且还因为推理时的延迟增加。我们提出了一种新型的模型压缩方法，该方法生成了一个稀疏的训练模型而无需其他开销：允许（i）（i）稀疏模式的动态分配，（ii）将反馈信号合并以重新激活过度修剪的权重，我们在一次训练中获得了表现的稀疏模型（不需要重新训练（不需要重新训练），但可以进一步提高性能）。我们在CIFAR-10和Imagenet上评估了我们的方法，并表明所获得的稀疏模型可以达到密集模型的最新性能。此外，它们的性能超过了所有先前提出的修剪方案产生的模型的性能。

Deep neural networks often have millions of parameters. This can hinder their deployment to low-end devices, not only due to high memory requirements but also because of increased latency at inference. We propose a novel model compression method that generates a sparse trained model without additional overhead: by allowing (i) dynamic allocation of the sparsity pattern and (ii) incorporating feedback signal to reactivate prematurely pruned weights we obtain a performant sparse model in one single training pass (retraining is not needed, but can further improve the performance). We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models. Moreover, their performance surpasses that of models generated by all previously proposed pruning schemes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题