论文标题
通过持续学习规则来编程神经网的神经微分方程
Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules
论文作者
论文摘要
神经普通微分方程(ODE)吸引了很多关注,因为深度残留神经网络(NNS)的连续时间对应物,并提出了许多反复NNS的扩展。自1980年代以来,ODE也被用来得出NN学习规则的理论结果,例如OJA规则和主要成分分析之间的著名联系。此类规则通常表示为具有直接ODE对应物的加法迭代更新过程。在这里,我们介绍了学习规则和神经ODE的新型组合,以构建连续的时间序列处理网,这些序列处理网已经学会在其他网的快速变化的突触连接中操纵短期记忆。这产生了快速重量程序员和线性变压器的连续时间对应物。我们的新型模型在各种时间序列分类任务上优于现有神经控制差分方程的最佳模型,同时还解决了它们的基本可扩展性限制。我们的代码是公开的。
Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed. Since the 1980s, ODEs have also been used to derive theoretical results for NN learning rules, e.g., the famous connection between Oja's rule and principal component analysis. Such rules are typically expressed as additive iterative update processes which have straightforward ODE counterparts. Here we introduce a novel combination of learning rules and Neural ODEs to build continuous-time sequence processing nets that learn to manipulate short-term memory in rapidly changing synaptic connections of other nets. This yields continuous-time counterparts of Fast Weight Programmers and linear Transformers. Our novel models outperform the best existing Neural Controlled Differential Equation based models on various time series classification tasks, while also addressing their fundamental scalability limitations. Our code is public.