论文标题
各性核如何在简单不变的
How isotropic kernels perform on simple invariants
论文作者
论文摘要
我们研究了各向同性内核方法的训练曲线如何取决于在多种设置中要学习的任务的对称性。 (i)我们考虑一个回归任务,其中目标函数是一个高斯随机字段,仅取决于$ d_ \ Parallel $变量,少于输入尺寸$ d $。我们计算$ε\ sim p^{ - β} $遵循预期的测试错误$ε$,其中$ p $是训练集的大小。我们发现,$β\ sim 1/d $独立于$ d_ \ Parallel $,支持以前的发现,即不变的存在无法解决内核回归的维度诅咒。 (ii)接下来,我们考虑支持 - 矢量二进制分类,并引入条纹模型,其中数据标签取决于单个坐标$ y(\ usew suesper {x})= y(x_1)$,对应于分隔不同符号标签的平行决策边界,并考虑在这些接口处的余量。我们对数字上的争论并确认,对于大带宽,$β= \ frac {d-1+ξ} {3d-3+ξ} $,其中$ξ\ in(0,2)$是表征原点的奇异性的指数。该估计提高了可从Rademacher复杂性获得的经典界限。在这种情况下,由于$β\ rightarrow 1/3 $ as $ d \ rightarrow \ infty $,因此没有任何维度的诅咒。 (iii)我们为$ y(\ usevenline {x})= y(| \下划线{x} |)$的球形模型确认了这些发现。 (iv)在条纹模型中,我们表明,如果通过某种因素$λ$(据信在深网中发生的操作)沿其不变性压缩数据,则测试误差将通过因子$λ^{ - \ frac {2(d-1(d-1)} {3d-3+ξ}} $减少。
We investigate how the training curve of isotropic kernel methods depends on the symmetry of the task to be learned, in several settings. (i) We consider a regression task, where the target function is a Gaussian random field that depends only on $d_\parallel$ variables, fewer than the input dimension $d$. We compute the expected test error $ε$ that follows $ε\sim p^{-β}$ where $p$ is the size of the training set. We find that $β\sim 1/d$ independently of $d_\parallel$, supporting previous findings that the presence of invariants does not resolve the curse of dimensionality for kernel regression. (ii) Next we consider support-vector binary classification and introduce the stripe model where the data label depends on a single coordinate $y(\underline{x}) = y(x_1)$, corresponding to parallel decision boundaries separating labels of different signs, and consider that there is no margin at these interfaces. We argue and confirm numerically that for large bandwidth, $β= \frac{d-1+ξ}{3d-3+ξ}$, where $ξ\in (0,2)$ is the exponent characterizing the singularity of the kernel at the origin. This estimation improves classical bounds obtainable from Rademacher complexity. In this setting there is no curse of dimensionality since $β\rightarrow 1 / 3$ as $d\rightarrow\infty$. (iii) We confirm these findings for the spherical model for which $y(\underline{x}) = y(|\underline{x}|)$. (iv) In the stripe model, we show that if the data are compressed along their invariants by some factor $λ$ (an operation believed to take place in deep networks), the test error is reduced by a factor $λ^{-\frac{2(d-1)}{3d-3+ξ}}$.