论文标题
深层网络培训的两个制度
The Two Regimes of Deep Network Training
论文作者
论文摘要
学习率计划对深度学习模型的表现有重大影响。尽管如此,时间表的选择通常是启发式的。我们旨在确切地了解不同学习率计划的影响以及选择它们的适当方法。为此,我们隔离了两个不同的训练阶段,即我们称为“大步”制度的第一个阶段,从优化的角度表现出相当差的性能,但是模型概括的主要贡献者。后者的“小步骤”制度表现出更多的“凸状”优化行为,但用于隔离产生的模型概括较差。我们发现,通过分别处理这些制度,并将我们的培训算法专门针对每个制度,我们可以大大简化学习率的时间表。
Learning rate schedule has a major impact on the performance of deep learning models. Still, the choice of a schedule is often heuristical. We aim to develop a precise understanding of the effects of different learning rate schedules and the appropriate way to select them. To this end, we isolate two distinct phases of training, the first, which we refer to as the "large-step" regime, exhibits a rather poor performance from an optimization point of view but is the primary contributor to model generalization; the latter, "small-step" regime exhibits much more "convex-like" optimization behavior but used in isolation produces models that generalize poorly. We find that by treating these regimes separately-and em specializing our training algorithm to each one of them, we can significantly simplify learning rate schedules.