呼吸k均值：通过动态k值的上级K-均值解决方案

论文标题

呼吸k均值：通过动态k值的上级K-均值解决方案

Breathing K-Means: Superior K-Means Solutions through Dynamic K-Values

论文作者

Fritzke, Bernd

论文摘要

我们介绍了呼吸K-均值算法，该算法平均可以显着改善广泛熟悉的贪婪K-MEANS ++算法获得的解决方案，这是Scikit-Learn包中K-均值聚类的默认方法。通过新颖的``呼吸''技术实现了改进，该技术可周期性地增加并减少基于局部误差和效用测量的质心数量。我们使用贪婪的K-均值++作为基线进行了实验，将其与呼吸k-均值和其他五种K-均值算法进行了比较。在研究的方法中，只有呼吸k-均值和更好的K-均值++始终超过基线，呼吸k-均值表现出很大的铅。即使将所有其他算法的十种运行的最佳结果与单一呼吸K-均值进行比较，也可以保持这种出色的性能，从而突出了其有效性和速度。我们的发现表明，呼吸K-均值算法的表现优于其他K-均值技术，尤其是具有十种重复的贪婪的K-MEANS ++，在溶液质量和速度方面都占主导地位。这将呼吸k均值（由贪婪的K-Means ++的内置初始化）作为越来越多的替代品，可以单独运行贪婪的K-Means ++。

We introduce the breathing k-means algorithm, which on average significantly improves solutions obtained by the widely-known greedy k-means++ algorithm, the default method for k-means clustering in the scikit-learn package. The improvements are achieved through a novel ``breathing'' technique, that cyclically increases and decreases the number of centroids based on local error and utility measures. We conducted experiments using greedy k-means++ as a baseline, comparing it with breathing k-means and five other k-means algorithms. Among the methods investigated, only breathing k-means and better k-means++ consistently outperformed the baseline, with breathing k-means demonstrating a substantial lead. This superior performance was maintained even when comparing the best result of ten runs for all other algorithms to a single run of breathing k-means, highlighting its effectiveness and speed. Our findings indicate that the breathing k-means algorithm outperforms the other k-means techniques, especially greedy k-means++ with ten repetitions, which it dominates in both solution quality and speed. This positions breathing k-means (with the built-in initialization by a single run of greedy k-means++) as a superior alternative to running greedy k-means++ on its own.

下载PDF全文

下载文献需遵守相关版权规定

论文标题