论文标题

在具有正交初始化的深网的神经切线内核上

On the Neural Tangent Kernel of Deep Networks with Orthogonal Initialization

论文作者

Huang, Wei, Du, Weitao, Da Xu, Richard Yi

论文摘要

流行的想法是,正交的重量对于执行动态等轴测和加速训练至关重要。从线性网络中正交初始化导致的学习速度的提高已得到很好的证实。但是,虽然在满足动力学等轴测情况状态时也认为同样的是非线性网络,但尚未对此争论背后的训练动力学进行彻底探讨。在这项工作中,我们研究了各种架构的超宽网络的动力学,包括完全连接的网络(FCN)和卷积神经网络(CNN),并通过神经切线核(NTK)进行正交初始化。通过一系列的命题和引理,我们证明了两个NTK,一个对应于高斯重量的NTK和一个对应于正交的重量,当网络宽度是无限的时。此外,在训练过程中,如果理论上保持恒定,则应保持正交定位的无限宽度网络的NTK。这表明与普遍的想法相反,正交初始化无法加快NTK(懒惰训练)制度的训练。为了在什么情况下探索正交性加速培训,我们在NTK政权之外进行了彻底的经验调查。我们发现,当设置超参数以实现非线性激活中的线性状态时,正交初始化可以通过较大的学习率或较大的深度提高学习速度。

The prevailing thinking is that orthogonal weights are crucial to enforcing dynamical isometry and speeding up training. The increase in learning speed that results from orthogonal initialization in linear networks has been well-proven. However, while the same is believed to also hold for nonlinear networks when the dynamical isometry condition is satisfied, the training dynamics behind this contention have not been thoroughly explored. In this work, we study the dynamics of ultra-wide networks across a range of architectures, including Fully Connected Networks (FCNs) and Convolutional Neural Networks (CNNs) with orthogonal initialization via neural tangent kernel (NTK). Through a series of propositions and lemmas, we prove that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite. Further, during training, the NTK of an orthogonally-initialized infinite-width network should theoretically remain constant. This suggests that the orthogonal initialization cannot speed up training in the NTK (lazy training) regime, contrary to the prevailing thoughts. In order to explore under what circumstances can orthogonality accelerate training, we conduct a thorough empirical investigation outside the NTK regime. We find that when the hyper-parameters are set to achieve a linear regime in nonlinear activation, orthogonal initialization can improve the learning speed with a large learning rate or large depth.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源