与经验神经切线内核的快速，有根据的近似

论文标题

与经验神经切线内核的快速，有根据的近似

A Fast, Well-Founded Approximation to the Empirical Neural Tangent Kernel

论文作者

Mohamadi, Mohamad Amin, Bae, Wonho, Sutherland, Danica J.

论文摘要

经验神经切线内核（ENTK）可以很好地理解给定网络的表示：它们通常比无限宽度NTK的计算要便宜得多。但是，对于具有O输出单元的网络（例如O级分类器），但是，n输入上的ENTK是尺寸$ no \ times no $ $ $ o（（no）^2）$内存，最多可达$ o（（no）^3）$ cumput。因此，大多数现有的应用程序都使用了少数几个近似值之一，该近似值产生了$ n \ times n $内核矩阵，节省了计算的数量级，但没有理由。我们证明，我们称之为“ logits之和”的一个近似值将在任何具有宽最终“读取”层的网络时收敛到True Entk。我们的实验证明了这种近似值的质量，用于各种设置的各种用途。

Empirical neural tangent kernels (eNTKs) can provide a good understanding of a given network's representation: they are often far less expensive to compute and applicable more broadly than infinite width NTKs. For networks with O output units (e.g. an O-class classifier), however, the eNTK on N inputs is of size $NO \times NO$, taking $O((NO)^2)$ memory and up to $O((NO)^3)$ computation. Most existing applications have therefore used one of a handful of approximations yielding $N \times N$ kernel matrices, saving orders of magnitude of computation, but with limited to no justification. We prove that one such approximation, which we call "sum of logits", converges to the true eNTK at initialization for any network with a wide final "readout" layer. Our experiments demonstrate the quality of this approximation for various uses across a range of settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题