论文标题
通过动力学系统的镜头,神经网络的深度宽度取舍更好
Better Depth-Width Trade-offs for Neural Networks through the lens of Dynamical Systems
论文作者
论文摘要
神经网络与其深度,宽度和激活单元类型的函数的表达性一直是深度学习理论中的重要问题。最近,使用与动态系统的新连接获得了Relu网络的深度分离结果,使用连续地图$ f $的固定点(称为周期点)的固定点的广义概念。在这项工作中,我们加强了与动态系统的联系,并沿几个方面改善了现有的宽度下限。我们的第一个主要结果是特定于周期的宽度下限,在$ l^1 $ approximation误差的较强概念下,而不是分类错误。我们的第二个贡献是,在以前的结果不适用的制度中,我们提供更尖锐的宽度下限,仍会产生有意义的指数深度宽度分离。我们结果的副产品是,只要$ f $具有奇怪的时期,就会存在一个普遍的常数来表征深度宽度的权衡。从技术上讲,我们的结果遵循以下三个给定函数之间的紧密联系:其周期,其LIPSCHITZ常数和在功能$ f $组成下与本身产生的振荡数量的增长率。
The expressivity of neural networks as a function of their depth, width and type of activation units has been an important question in deep learning theory. Recently, depth separation results for ReLU networks were obtained via a new connection with dynamical systems, using a generalized notion of fixed points of a continuous map $f$, called periodic points. In this work, we strengthen the connection with dynamical systems and we improve the existing width lower bounds along several aspects. Our first main result is period-specific width lower bounds that hold under the stronger notion of $L^1$-approximation error, instead of the weaker classification error. Our second contribution is that we provide sharper width lower bounds, still yielding meaningful exponential depth-width separations, in regimes where previous results wouldn't apply. A byproduct of our results is that there exists a universal constant characterizing the depth-width trade-offs, as long as $f$ has odd periods. Technically, our results follow by unveiling a tighter connection between the following three quantities of a given function: its period, its Lipschitz constant and the growth rate of the number of oscillations arising under compositions of the function $f$ with itself.