对称性，平面最小值和保守量的梯度流量

论文标题

对称性，平面最小值和保守量的梯度流量

Symmetries, flat minima, and the conserved quantities of gradient flow

论文作者

Zhao, Bo, Ganev, Iordan, Walters, Robin, Yu, Rose, Dehmamy, Nima

论文摘要

深网的损失格局的经验研究表明，许多局部最小值通过低损失的山谷连接。然而，关于这种山谷的理论起源知之甚少。我们提出了一个在参数空间中找到连续对称性的一般框架，该框架散发出低损失的山谷。我们的框架使用激活功能的等效性，可以应用于不同的层体系结构。为了将该框架推广到非线性神经网络，我们引入了一组新型的非线性，数据依赖性的对称性。这些对称性可以改变受过训练的模型，使其在新样本上类似地执行，从而可以在某些对抗性攻击下提高鲁棒性的集合构建。然后，我们表明与线性对称性相关的保守量可以用于定义沿低损失谷的坐标。保守数量有助于揭示，使用常见的初始化方法，梯度流仅探索全局最小值的一小部分。通过将保守的数量与最低收敛速率和清晰度有关，我们就初始化如何影响收敛性和概括性提供了见解。

Empirical studies of the loss landscape of deep networks have revealed that many local minima are connected through low-loss valleys. Yet, little is known about the theoretical origin of such valleys. We present a general framework for finding continuous symmetries in the parameter space, which carve out low-loss valleys. Our framework uses equivariances of the activation functions and can be applied to different layer architectures. To generalize this framework to nonlinear neural networks, we introduce a novel set of nonlinear, data-dependent symmetries. These symmetries can transform a trained model such that it performs similarly on new samples, which allows ensemble building that improves robustness under certain adversarial attacks. We then show that conserved quantities associated with linear symmetries can be used to define coordinates along low-loss valleys. The conserved quantities help reveal that using common initialization methods, gradient flow only explores a small part of the global minimum. By relating conserved quantities to convergence rate and sharpness of the minimum, we provide insights on how initialization impacts convergence and generalizability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题