使用浅神经网络学习的高斯生成模型等效性

论文标题

使用浅神经网络学习的高斯生成模型等效性

The Gaussian equivalence of generative models for learning with shallow neural networks

论文作者

Goldt, Sebastian, Loureiro, Bruno, Reeves, Galen, Krzakala, Florent, Mézard, Marc, Zdeborová, Lenka

论文摘要

了解数据结构对学习计算障碍的影响是神经网络理论的关键挑战。许多理论作品没有明确建模训练数据，或者假设输入是独立于某些简单概率分布的独立于组件的。在这里，我们通过研究对从预先训练的生成模型获取的数据训练的神经网络的性能，超越了这种简单的范式。这是由于高斯等效性的原因，表明关键的关键指标（例如训练和测试错误）可以由适当选择的高斯模型完全捕获。我们提供了三个严格，分析和数值证据，证实了这一等效性。首先，在单层生成模型的情况下，我们建立了高斯等效性的严格条件，以及分布收敛的确定性率。其次，我们利用这种等价性来得出一组封闭的方程组，描述了两个广泛研究的机器学习问题的概括性能：使用一通随机梯度下降训练的两层神经网络，以及全批次预付款的特征或内核方法。最后，我们执行实验，证明我们的理论如何应用于深层训练的生成模型。这些结果为使用现实数据的机器学习模型的理论研究打开了可行的途径。

Understanding the impact of data structure on the computational tractability of learning is a key challenge for the theory of neural networks. Many theoretical works do not explicitly model training data, or assume that inputs are drawn component-wise independently from some simple probability distribution. Here, we go beyond this simple paradigm by studying the performance of neural networks trained on data drawn from pre-trained generative models. This is possible due to a Gaussian equivalence stating that the key metrics of interest, such as the training and test errors, can be fully captured by an appropriately chosen Gaussian model. We provide three strands of rigorous, analytical and numerical evidence corroborating this equivalence. First, we establish rigorous conditions for the Gaussian equivalence to hold in the case of single-layer generative models, as well as deterministic rates for convergence in distribution. Second, we leverage this equivalence to derive a closed set of equations describing the generalisation performance of two widely studied machine learning problems: two-layer neural networks trained using one-pass stochastic gradient descent, and full-batch pre-learned features or kernel methods. Finally, we perform experiments demonstrating how our theory applies to deep, pre-trained generative models. These results open a viable path to the theoretical study of machine learning models with realistic data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题