论文标题

当非凸度平均时,一类Polyak-lojasiewicz的功能可证明重球超出四边形的二次加速度

Provable Acceleration of Heavy Ball beyond Quadratics for a Class of Polyak-Łojasiewicz Functions when the Non-Convexity is Averaged-Out

论文作者

Wang, Jun-Kun, Lin, Chi-Heng, Wibisono, Andre, Hu, Bin

论文摘要

如今,重球(HB)是非凸优化中最流行的动量方法之一。已经广泛观察到,将重球动态纳入基于梯度的方法可以加速现代机器学习模型的训练过程。但是,建立其加速理论基础的进展显然远远落后于其经验成功。现有可证明的加速结果是二次或近二次功能,因为当前显示HB加速度的技术仅限于固定Hessian的情况。在这项工作中,我们开发了一些新技术,这些新技术有助于表现出二次超越二次的加速度,这是通过分析在两个连续时间点上如何变化的Hessian的变化来实现的,从而影响了收敛速度。基于我们的技术结果,可以通过HB确定一类Polyak-lojasiewicz(PL)优化问题。此外,我们的分析证明了适应性设置动量参数的好处。 (更新:08/29/2023)在附录J中添加了Erratum。这是一个更新版本,可以在上一个版本中解决问题。在这项工作中,HB超出四边形的加速结果需要满足额外的条件,当尺寸是一个或更广泛的Hessian是对角线时,该结果自然而然地存在。我们在附录J中详细介绍了这个问题。

Heavy Ball (HB) nowadays is one of the most popular momentum methods in non-convex optimization. It has been widely observed that incorporating the Heavy Ball dynamic in gradient-based methods accelerates the training process of modern machine learning models. However, the progress on establishing its theoretical foundation of acceleration is apparently far behind its empirical success. Existing provable acceleration results are of the quadratic or close-to-quadratic functions, as the current techniques of showing HB's acceleration are limited to the case when the Hessian is fixed. In this work, we develop some new techniques that help show acceleration beyond quadratics, which is achieved by analyzing how the change of the Hessian at two consecutive time points affects the convergence speed. Based on our technical results, a class of Polyak-Łojasiewicz (PL) optimization problems for which provable acceleration can be achieved via HB is identified. Moreover, our analysis demonstrates a benefit of adaptively setting the momentum parameter. (Update: 08/29/2023) Erratum is added in Appendix J. This is an updated version that fixes an issue in the previous version. An additional condition needs to be satisfied for the acceleration result of HB beyond quadratics in this work, which naturally holds when the dimension is one or, more broadly, when the Hessian is diagonal. We elaborate on the issue in Appendix J.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源