论文标题

可扩展的光谱聚类,具有NyStrom近似:实用和理论方面

Scalable Spectral Clustering with Nystrom Approximation: Practical and Theoretical Aspects

论文作者

Pourkamali-Anaraki, Farhad

论文摘要

光谱聚类技术是信号处理和机器学习的有价值的工具,用于分区复杂的数据集。光谱聚类的有效性源于基于创建相似性图并计算Laplacian矩阵的光谱分解的非线性嵌入。但是,由于高计算成本和内存使用情况,光谱聚类方法无法扩展到大数据集。解决这些问题的一种流行方法是利用Nystrom方法,Nystrom方法是一种基于有效抽样的算法,用于计算大型阳性半明确矩阵的低级别近似值。本文展示了以前流行的基于尼斯特氏菌的光谱聚类的方法有严重的局限性。现有的时间效率方法通过过早地降低与采样点相关的相似性矩阵的等级来忽略关键信息。同样,当前对利用NyStrom近似如何影响光谱嵌入近似值的质量的理解受到限制。为了解决局限性,这项工作提出了一种原则的光谱聚类算法,该算法利用了与采样点相关的相似性矩阵的光谱特性,以调节准确性效率折衷。我们提供了理论上的结果,以减少当前差距,并使用真实和合成数据进行数值实验。经验结果证明了该方法与基于NyStrom方法和其他有效方法的现有光谱聚类技术相比,该方法的功效和效率。这项工作的总体目标是为未来的研究方向提供改进的基准,以加速光谱聚类。

Spectral clustering techniques are valuable tools in signal processing and machine learning for partitioning complex data sets. The effectiveness of spectral clustering stems from constructing a non-linear embedding based on creating a similarity graph and computing the spectral decomposition of the Laplacian matrix. However, spectral clustering methods fail to scale to large data sets because of high computational cost and memory usage. A popular approach for addressing these problems utilizes the Nystrom method, an efficient sampling-based algorithm for computing low-rank approximations to large positive semi-definite matrices. This paper demonstrates how the previously popular approach of Nystrom-based spectral clustering has severe limitations. Existing time-efficient methods ignore critical information by prematurely reducing the rank of the similarity matrix associated with sampled points. Also, current understanding is limited regarding how utilizing the Nystrom approximation will affect the quality of spectral embedding approximations. To address the limitations, this work presents a principled spectral clustering algorithm that exploits spectral properties of the similarity matrix associated with sampled points to regulate accuracy-efficiency trade-offs. We provide theoretical results to reduce the current gap and present numerical experiments with real and synthetic data. Empirical results demonstrate the efficacy and efficiency of the proposed method compared to existing spectral clustering techniques based on the Nystrom method and other efficient methods. The overarching goal of this work is to provide an improved baseline for future research directions to accelerate spectral clustering.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源