随机学习率的随机梯度下降

论文标题

随机学习率的随机梯度下降

Stochastic gradient descent with random learning rate

论文作者

Musso, Daniele

论文摘要

我们建议以均匀分布的随机学习率优化神经网络。相关的随机梯度下降算法可以通过连续的随机方程近似，并在Fokker-Planck形式主义中进行分析。在较小的学习率制度中，训练过程的特征是有效温度，取决于平均学习率，小批量的大小和优化算法的动量。通过将随机学习率协议与循环和恒定协议进行比较，我们建议随机选择通常是小型学习率制度中最好的策略，可以在没有额外的计算成本的情况下获得更好的正则化。我们通过对MNIST和CIFAR10数据集的浅层，完全连接和深度卷积神经网络进行实验提供支持证据。

We propose to optimize neural networks with a uniformly-distributed random learning rate. The associated stochastic gradient descent algorithm can be approximated by continuous stochastic equations and analyzed within the Fokker-Planck formalism. In the small learning rate regime, the training process is characterized by an effective temperature which depends on the average learning rate, the mini-batch size and the momentum of the optimization algorithm. By comparing the random learning rate protocol with cyclic and constant protocols, we suggest that the random choice is generically the best strategy in the small learning rate regime, yielding better regularization without extra computational cost. We provide supporting evidence through experiments on both shallow, fully-connected and deep, convolutional neural networks for image classification on the MNIST and CIFAR10 datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题