在嘈杂制度中学习高维基调的样本复杂性范围

论文标题

在嘈杂制度中学习高维基调的样本复杂性范围

Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes

论文作者

Saberi, Amir Hossein, Najafi, Amir, Motahari, Seyed Abolfazl, Khalaj, Babak H.

论文摘要

在本文中，我们发现一个样本复杂性是从嘈杂样本中学习单纯形的。假设给出了一个大小$ n $的数据集，其中包括i.i.d.样品是从$ \ mathbb {r}^k $中未知单纯形上的均匀分布中得出的，其中假定样品被任意幅度的多变量添加性高斯噪声损坏。我们证明了一种算法的存在，该算法具有高概率输出的单纯词，其$ \ ell_2 $距离最多是$ \ varepsilon $从True Simplex（对于任何$ \ varepsilon> 0 $）。另外，我们从理论上表明，为了实现这一界限，足以拥有$ n \ ge \ left（k^2/\ varepsilon^2 \ right）e^{ω\ left（k/\ mathrm {snrm {snrm {snr}^2 \ right）} $ samples，其中$ \ mathrm {snrm {snrm {snrm {snr} $ nards for insime for cuity cuity cuity cuity cuity cuity cuity cuity。该结果解决了一个重要的开放问题，只需显示$ \ mathrm {snr} \geΩ\ left（k^{1/2} \ right）$，嘈杂制度的样本复杂性与无噪声案例的样本复杂性具有相同的顺序。我们的证明是\ citep {ashtiani2018yly}中所谓的样品压缩技术的组合，高维几何形状的数学工具和傅立叶分析。特别是，我们提出了一种一般的基于傅立叶的技术，用于从加性高斯噪声中恢复更一般的分布家族，该噪声可以在各种其他相关问题中进一步使用。

In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{Ω\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\geΩ\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.

下载PDF全文

下载文献需遵守相关版权规定

论文标题