论文标题
列表可解码的协方差估计
List-Decodable Covariance Estimation
论文作者
论文摘要
我们给出了\ emph {list可解码协方差估计}的第一个多项式时间算法。对于任何$α> 0 $,我们的算法采用输入$ y \ subseteq \ mathbb {r}^d $的大小$ n \ geq d^{\ geq d^{\ mathsf {poly}(poly}(1/α)} $,通过在$(1-α)$ point in I.i.i.i.d.d中获得的对抗性损坏$(1-α)n $。样本$ x $的尺寸$ n $从高斯分布中,未知平均值$μ_*$和协方差$σ_*$。在$ n^{\ mathsf {poly}(1/α)} $时间中,它输出了$ k = k(α)=(1/α)=(1/α)^{\ Mathsf {poly}(1/α)} $ cantifate参数的常数大小列表,具有很高的可能性$ tv(\ Mathcal {n}(μ_*,σ_*),\ Mathcal {n}(\hatμ,\hatς))<1-o_α(1)$。这是距离最强的距离概念,意味着具有独立尺寸误差的参数的乘法光谱和相对Frobenius距离近似。我们的算法更普遍地用于$(1-α)$ - 具有两个自然分析特性的低计分总和证书的任何分布$ d $的损坏:1)一维边际的抗浓度和2)学位2多项式的超额收获率。 在我们工作之前,估计可限制设置的协方差的唯一已知结果是针对Karmarkar,Klivans和Kothari(2019),Raghavendra和Yau(2019和2020)以及Bakshi and Kothari(2020)的特殊情况。这些结果需要超级物质时间,以在基础维度中获得任何子构误差。我们的结果意味着对于列表可解码的线性回归和子空间恢复的第一个多项式时间\ emph {extcrip}算法,尤其允许在多项式时间中获得$ 2^{ - \ Mathsf { - \ Mathsf {poly}} $错误。我们的结果还意味着改进了用于聚类非球体混合物的算法。
We give the first polynomial time algorithm for \emph{list-decodable covariance estimation}. For any $α> 0$, our algorithm takes input a sample $Y \subseteq \mathbb{R}^d$ of size $n\geq d^{\mathsf{poly}(1/α)}$ obtained by adversarially corrupting an $(1-α)n$ points in an i.i.d. sample $X$ of size $n$ from the Gaussian distribution with unknown mean $μ_*$ and covariance $Σ_*$. In $n^{\mathsf{poly}(1/α)}$ time, it outputs a constant-size list of $k = k(α)= (1/α)^{\mathsf{poly}(1/α)}$ candidate parameters that, with high probability, contains a $(\hatμ,\hatΣ)$ such that the total variation distance $TV(\mathcal{N}(μ_*,Σ_*),\mathcal{N}(\hatμ,\hatΣ))<1-O_α(1)$. This is the statistically strongest notion of distance and implies multiplicative spectral and relative Frobenius distance approximation for parameters with dimension independent error. Our algorithm works more generally for $(1-α)$-corruptions of any distribution $D$ that possesses low-degree sum-of-squares certificates of two natural analytic properties: 1) anti-concentration of one-dimensional marginals and 2) hypercontractivity of degree 2 polynomials. Prior to our work, the only known results for estimating covariance in the list-decodable setting were for the special cases of list-decodable linear regression and subspace recovery due to Karmarkar, Klivans, and Kothari (2019), Raghavendra and Yau (2019 and 2020) and Bakshi and Kothari (2020). These results need superpolynomial time for obtaining any subconstant error in the underlying dimension. Our result implies the first polynomial-time \emph{exact} algorithm for list-decodable linear regression and subspace recovery that allows, in particular, to obtain $2^{-\mathsf{poly}(d)}$ error in polynomial-time. Our result also implies an improved algorithm for clustering non-spherical mixtures.