论文标题
双向自我归一化神经网络
Bidirectionally Self-Normalizing Neural Networks
论文作者
论文摘要
消失和爆炸梯度的问题一直是一个长期存在的障碍,阻碍了神经网络的有效培训。尽管已经采用了各种技巧和技术来减轻实践中的问题,但仍然缺乏令人满意的理论或可证明的解决方案。在本文中,我们从高维概率理论的角度解决了问题。我们提供了一个严格的结果,该结果表明,在温和的条件下,如果神经网络具有足够的宽度,则消失/爆炸梯度问题如何消失。我们的主要思想是通过一类新的激活功能,即高斯 - 庞塞尔(Gaussian-Poincaré)标准化功能和正交重量矩阵来约束非线性神经网络中的前向和向后信号传播。合成和现实世界数据的实验验证了我们的理论,并确认其在实践中应用时对非常深的神经网络的有效性。
The problem of vanishing and exploding gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the vanishing/exploding gradients problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian-Poincaré normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.