论文标题

在嘈杂制度中学习高维基调的样本复杂性范围

Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes

论文作者

Saberi, Amir Hossein, Najafi, Amir, Motahari, Seyed Abolfazl, Khalaj, Babak H.

论文摘要

在本文中,我们发现一个样本复杂性是从嘈杂样本中学习单纯形的。假设给出了一个大小$ n $的数据集,其中包括i.i.d.样品是从$ \ mathbb {r}^k $中未知单纯形上的均匀分布中得出的,其中假定样品被任意幅度的多变量添加性高斯噪声损坏。我们证明了一种算法的存在,该算法具有高概率输出的单纯词,其$ \ ell_2 $距离最多是$ \ varepsilon $从True Simplex(对于任何$ \ varepsilon> 0 $)。另外,我们从理论上表明,为了实现这一界限,足以拥有$ n \ ge \ left(k^2/\ varepsilon^2 \ right)e^{ω\ left(k/\ mathrm {snrm {snrm {snr}^2 \ right)} $ samples,其中$ \ mathrm {snrm {snrm {snrm {snr} $ nards for insime for cuity cuity cuity cuity cuity cuity cuity cuity。该结果解决了一个重要的开放问题,只需显示$ \ mathrm {snr} \geΩ\ left(k^{1/2} \ right)$,嘈杂制度的样本复杂性与无噪声案例的样本复杂性具有相同的顺序。我们的证明是\ citep {ashtiani2018yly}中所谓的样品压缩技术的组合,高维几何形状的数学工具和傅立叶分析。特别是,我们提出了一种一般的基于傅立叶的技术,用于从加性高斯噪声中恢复更一般的分布家族,该噪声可以在各种其他相关问题中进一步使用。

In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{Ω\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\geΩ\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源