论文标题
内核对齐风险估计器:培训数据的风险预测
Kernel Alignment Risk Estimator: Risk Prediction from Training Data
论文作者
论文摘要
我们研究了内核岭回归(KRR)的风险(即概括错误),其中包括ridge $λ> 0 $和i.i.d.的内核$ k $。观察。为此,我们介绍了两个对象:信号捕获阈值(SCT)和内核比对风险估计器(KARE)。 sct $ \ vartheta_ {k,λ} $是数据分布的函数:它可用于识别KRR预测变量捕获的数据的组件,并近似(预期的)KRR风险。然后,这导致KARE $ρ_{k,λ} $的KRR风险近似,这是训练数据的明确函数,是真实数据分布的不可知论。我们在功能设置中表达回归问题。然后,关键结果来自对通用WishArt随机矩阵的stieltjes变换的有限大小分析。在自然的普遍性假设下(KRR矩渐近地依赖于观测的前两个矩),我们捕获了KRR预测指标的平均值和方差。我们从数值上研究了有关HIGGS和MNIST数据集的各种经典核的发现:kare可以很好地近似风险,从而支持我们的普遍性假设。使用KARE,可以直接从训练集中比较内核和超参数的选择。因此,kare提供了一个有希望的数据依赖性过程,可以选择概括良好的内核。
We study the risk (i.e. generalization error) of Kernel Ridge Regression (KRR) for a kernel $K$ with ridge $λ>0$ and i.i.d. observations. For this, we introduce two objects: the Signal Capture Threshold (SCT) and the Kernel Alignment Risk Estimator (KARE). The SCT $\vartheta_{K,λ}$ is a function of the data distribution: it can be used to identify the components of the data that the KRR predictor captures, and to approximate the (expected) KRR risk. This then leads to a KRR risk approximation by the KARE $ρ_{K, λ}$, an explicit function of the training data, agnostic of the true data distribution. We phrase the regression problem in a functional setting. The key results then follow from a finite-size analysis of the Stieltjes transform of general Wishart random matrices. Under a natural universality assumption (that the KRR moments depend asymptotically on the first two moments of the observations) we capture the mean and variance of the KRR predictor. We numerically investigate our findings on the Higgs and MNIST datasets for various classical kernels: the KARE gives an excellent approximation of the risk, thus supporting our universality assumption. Using the KARE, one can compare choices of Kernels and hyperparameters directly from the training set. The KARE thus provides a promising data-dependent procedure to select Kernels that generalize well.