论文标题
以越来越加权的平均重新访问SGD:优化和概括性观点
Revisiting SGD with Increasingly Weighted Averaging: Optimization and Generalization Perspectives
论文作者
论文摘要
随机梯度下降(SGD)已从不同角度广泛研究了文献,通常用于解决许多大数据机器学习问题。但是,将所有迭代溶液结合到单个溶液中的平均技术仍未探索。尽管文献中已经考虑了一些越来越加权的平均方案,但现有作品主要仅限于强烈凸出目标函数和优化误差的收敛性。目前尚不清楚这些平均方案如何影响{\ bf非严格凸目标的优化误差和概括误差和概括误差}的收敛性(测试误差的两个同样重要组成部分),包括非convex问题}。在本文中,我们{\它填补了差距},通过在优化误差和概括误差方面,全面分析凸,强烈凸和非凸目标函数的越来越加权的平均值。特别是,我们分析了一个越来越加权的平均家庭,其中迭代时解决方案的权重与$ t^α$($α> 0 $)成正比。我们展示了$α$如何影响优化误差和概括错误,并表现出由$α$引起的权衡。与其他平均方案相比,实验表明了这种权衡以及多项式加权平均的有效性,包括深度学习在内的广泛问题。
Stochastic gradient descent (SGD) has been widely studied in the literature from different angles, and is commonly employed for solving many big data machine learning problems. However, the averaging technique, which combines all iterative solutions into a single solution, is still under-explored. While some increasingly weighted averaging schemes have been considered in the literature, existing works are mostly restricted to strongly convex objective functions and the convergence of optimization error. It remains unclear how these averaging schemes affect the convergence of {\it both optimization error and generalization error} (two equally important components of testing error) for {\bf non-strongly convex objectives, including non-convex problems}. In this paper, we {\it fill the gap} by comprehensively analyzing the increasingly weighted averaging on convex, strongly convex and non-convex objective functions in terms of both optimization error and generalization error. In particular, we analyze a family of increasingly weighted averaging, where the weight for the solution at iteration $t$ is proportional to $t^α$ ($α> 0$). We show how $α$ affects the optimization error and the generalization error, and exhibit the trade-off caused by $α$. Experiments have demonstrated this trade-off and the effectiveness of polynomially increased weighted averaging compared with other averaging schemes for a wide range of problems including deep learning.