论文标题
关于神经体系结构的最大相互信息能力
On the Maximum Mutual Information Capacity of Neural Architectures
论文作者
论文摘要
我们为广泛的神经网络体系结构家族提供了最大共同信息的封闭式表达 - 可通过培训获得的$ i(x; z)$的最大值。该数量对于机器学习理论和实践的几个分支至关重要。从数量上讲,我们表明这些家族的最大共同信息都是源于单个捕获公式的概括。从定性上讲,我们表明,建筑的最大共同信息最大程度地受到网络最小层的宽度的影响 - 以不同的含义,以及该体系结构捕获的任何统计侵入性的“信息瓶颈”。
We derive the closed-form expression of the maximum mutual information - the maximum value of $I(X;Z)$ obtainable via training - for a broad family of neural network architectures. The quantity is essential to several branches of machine learning theory and practice. Quantitatively, we show that the maximum mutual information for these families all stem from generalizations of a single catch-all formula. Qualitatively, we show that the maximum mutual information of an architecture is most strongly influenced by the width of the smallest layer of the network - the "information bottleneck" in a different sense of the phrase, and by any statistical invariances captured by the architecture.