论文标题
深度神经网络的定向修剪
Directional Pruning of Deep Neural Networks
论文作者
论文摘要
鉴于随机梯度下降(SGD)经常在训练损失中找到平坦的最小山谷,我们提出了一种新颖的定向修剪方法,该方法在该平坦区域中寻找稀疏的最小化器。所提出的修剪方法不需要再培训或稀疏度的专家知识。为了克服估计平面方向的计算高架性,我们建议使用经过精心调整的$ \ ell_1 $近端梯度算法,在足够的训练后,可以通过较小的学习率来实现方向修剪。经验结果证明了我们在与Imagenet的RESNET50上的许多现有修剪方法中,我们解决方案在高度稀疏方案中的有希望的结果,而仅使用SGD稍高的壁时间和记忆足迹。使用CIFAR-10和CIFAR-100上的VGG16和宽Resnet 28x10,我们证明我们的解决方案达到了与SGD相同的最小值山谷,以及我们的解决方案和SGD发现的最小值不会偏离影响训练损失的方向。复制本文结果的代码可在https://github.com/donlan2710/grda-optimizer/tree/master/master/directional_pruning上获得。
In the light of the fact that the stochastic gradient descent (SGD) often finds a flat minimum valley in the training loss, we propose a novel directional pruning method which searches for a sparse minimizer in or close to that flat region. The proposed pruning method does not require retraining or the expert knowledge on the sparsity level. To overcome the computational formidability of estimating the flat directions, we propose to use a carefully tuned $\ell_1$ proximal gradient algorithm which can provably achieve the directional pruning with a small learning rate after sufficient training. The empirical results demonstrate the promising results of our solution in highly sparse regime (92% sparsity) among many existing pruning methods on the ResNet50 with the ImageNet, while using only a slightly higher wall time and memory footprint than the SGD. Using the VGG16 and the wide ResNet 28x10 on the CIFAR-10 and CIFAR-100, we demonstrate that our solution reaches the same minima valley as the SGD, and the minima found by our solution and the SGD do not deviate in directions that impact the training loss. The code that reproduces the results of this paper is available at https://github.com/donlan2710/gRDA-Optimizer/tree/master/directional_pruning.