论文标题
正交过度参数化培训
Orthogonal Over-Parameterized Training
论文作者
论文摘要
神经网络的归纳偏见在很大程度上取决于架构和培训算法。为了实现良好的概括,如何有效训练神经网络非常重要。我们提出了一种新型的正交过度参数化训练(OPT)框架,可以证明可以最大程度地减少超球能量,以表征神经元在Hypersphere上的多样性。通过在训练期间保持最小超球能量,OPT可以大大改善经验概括。具体而言,OPT固定神经元的随机初始化权重,并学习适用于这些神经元的正交转换。我们考虑多种方法来学习这种正交转换,包括展开的正交算法,应用正交参数化以及设计具有正交性的梯度下降。为了获得更好的可伸缩性,我们提出了随机的OPT,该随机性对神经元的部分维度进行正交转换。有趣的是,OPT揭示了学习适当的神经元坐标系对概括至关重要。我们提供了一些有关为什么OPT会产生更好概括的见解。广泛的实验验证了OPT优于标准训练的优势。
The inductive bias of a neural network is largely determined by the architecture and the training algorithm. To achieve good generalization, how to effectively train a neural network is of great importance. We propose a novel orthogonal over-parameterized training (OPT) framework that can provably minimize the hyperspherical energy which characterizes the diversity of neurons on a hypersphere. By maintaining the minimum hyperspherical energy during training, OPT can greatly improve the empirical generalization. Specifically, OPT fixes the randomly initialized weights of the neurons and learns an orthogonal transformation that applies to these neurons. We consider multiple ways to learn such an orthogonal transformation, including unrolling orthogonalization algorithms, applying orthogonal parameterization, and designing orthogonality-preserving gradient descent. For better scalability, we propose the stochastic OPT which performs orthogonal transformation stochastically for partial dimensions of neurons. Interestingly, OPT reveals that learning a proper coordinate system for neurons is crucial to generalization. We provide some insights on why OPT yields better generalization. Extensive experiments validate the superiority of OPT over the standard training.