不完美内核的分布式回归速率

论文标题

不完美内核的分布式回归速率

Optimal Rates of Distributed Regression with Imperfect Kernels

论文作者

Sun, Hongwei, Wu, Qiang

论文摘要

分布式机器学习系统一直在接受越来越注意的效率，以处理大规模数据。已经为不同的机器学习任务提出了许多分布式框架。在本文中，我们通过鸿沟和征服方法研究了分布式内核回归。如果将内核被完美地选择，则该方法已被证明是渐近的最小值，以便真正的回归函数在于相关的繁殖核Hilbert Space。但是，这通常是不切实际的，因为只能通过先验知识或调整过程选择的内核几乎不是完美的。相反，更常见的是，内核足够好，但在某种意义上是不完美的，因为真正的回归可以很好地近似，但并不完全位于内核空间中。在这种情况下，我们显示的分布式内核回归仍然可以实现能力无关的最佳速度。为此，我们首先建立了一个通用框架，该框架可以通过在单个数据集中对此类算法的误差进行界限来分析分布式回归，前提是误差范围已纳入响应变量无法解释的差异的影响。然后，我们对内核脊回归和偏置校正的内核脊回归进行了一项分析，该分析与上述框架相结合，使我们能够得出相关的分布式内核回归算法的急剧误差界限和急剧误差界限和能力无关的最佳速率。作为彻底分析的副产品，我们还证明了内核脊回归可以在无噪声设置中的$ n^{ - 1} $（其中$ n $是样本量）的速度快的速度，据我们所知，在回归学习中首先观察到的新颖性。

Distributed machine learning systems have been receiving increasing attentions for their efficiency to process large scale data. Many distributed frameworks have been proposed for different machine learning tasks. In this paper, we study the distributed kernel regression via the divide and conquer approach. This approach has been proved asymptotically minimax optimal if the kernel is perfectly selected so that the true regression function lies in the associated reproducing kernel Hilbert space. However, this is usually, if not always, impractical because kernels that can only be selected via prior knowledge or a tuning process are hardly perfect. Instead it is more common that the kernel is good enough but imperfect in the sense that the true regression can be well approximated by but does not lie exactly in the kernel space. We show distributed kernel regression can still achieves capacity independent optimal rate in this case. To this end, we first establish a general framework that allows to analyze distributed regression with response weighted base algorithms by bounding the error of such algorithms on a single data set, provided that the error bounds has factored the impact of the unexplained variance of the response variable. Then we perform a leave one out analysis of the kernel ridge regression and bias corrected kernel ridge regression, which in combination with the aforementioned framework allows us to derive sharp error bounds and capacity independent optimal rates for the associated distributed kernel regression algorithms. As a byproduct of the thorough analysis, we also prove the kernel ridge regression can achieve rates faster than $N^{-1}$ (where $N$ is the sample size) in the noise free setting which, to our best knowledge, are first observed and novel in regression learning.

下载PDF全文

下载文献需遵守相关版权规定

论文标题