论文标题
学习规则对广泛神经网络中表示动态的影响
The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks
论文作者
论文摘要
目前尚不清楚如何改变深层神经网络的学习规则会改变其学习动力和表示。为了深入了解学习特征,功能近似和学习规则之间的关系,我们分析了经过梯度下降(GD)训练的无限宽度深网和在生物学上可见的替代方案,包括反馈对准(FA),直接反馈对准(DFA),以及错误的Hebbian Learning(Hebbian Learning(Hebb)(Hebb),以及GED LINSEAL LINSEAR LINSEAR LINSEAR LINSEAR(GLN)(GLN)(GLN)(GLN)(GLN)(GLN)我们表明,对于这些学习规则中的每一个,无限宽度的输出函数的演变都由有效的神经切线核(ENTK)的时间约束。在懒惰的训练限制中,该ENTK是静态的,并且不演变,而在丰富的平均场状态下,可以使用动力学平均场理论(DMFT)来自一确定内核的进化。该DMFT可以比较这些学习规则中每个的功能和预测动态。在懒惰的限制中,我们发现DFA和HEBB只能使用最后一层特征学习,而完整的FA可以使用以前的层,并通过FeedForward和Refect Bogeback Weight量矩阵之间的初始相关性确定的比例。在丰富的制度中,DFA和FA利用了时间发展和深度依赖性的NTK。违反直觉,我们发现,在富裕制度中训练的FA网络如果在前和向后传球权重之间的相关性较小的相关性初始化,则表现出更多的特征学习。 GLNs承认其懒惰限制内核的非常简单的公式,并在门控函数下保留其预反应的条件性高斯性。错误调制的HEBB规则显示了其内核与任务相关的非常小,并在最后一层执行大多数与任务相关的学习。
It is unclear how changing the learning rule of a deep neural network alters its learning dynamics and representations. To gain insight into the relationship between learned features, function approximation, and the learning rule, we analyze infinite-width deep networks trained with gradient descent (GD) and biologically-plausible alternatives including feedback alignment (FA), direct feedback alignment (DFA), and error modulated Hebbian learning (Hebb), as well as gated linear networks (GLN). We show that, for each of these learning rules, the evolution of the output function at infinite width is governed by a time varying effective neural tangent kernel (eNTK). In the lazy training limit, this eNTK is static and does not evolve, while in the rich mean-field regime this kernel's evolution can be determined self-consistently with dynamical mean field theory (DMFT). This DMFT enables comparisons of the feature and prediction dynamics induced by each of these learning rules. In the lazy limit, we find that DFA and Hebb can only learn using the last layer features, while full FA can utilize earlier layers with a scale determined by the initial correlation between feedforward and feedback weight matrices. In the rich regime, DFA and FA utilize a temporally evolving and depth-dependent NTK. Counterintuitively, we find that FA networks trained in the rich regime exhibit more feature learning if initialized with smaller correlation between the forward and backward pass weights. GLNs admit a very simple formula for their lazy limit kernel and preserve conditional Gaussianity of their preactivations under gating functions. Error modulated Hebb rules show very small task-relevant alignment of their kernels and perform most task relevant learning in the last layer.