CK均值，一种新颖的无监督学习方法，结合了模糊和酥脆的聚类方法来提取相交数据

论文标题

CK均值，一种新颖的无监督学习方法，结合了模糊和酥脆的聚类方法来提取相交数据

ck-means, a novel unsupervised learning method that combines fuzzy and crispy clustering methods to extract intersecting data

论文作者

Dessureault, Jean-Sébastien, Massicotte, Daniel

论文摘要

聚类数据是无监督的机器学习领域的流行功能。大多数算法旨在找到提取一致数据簇的最佳方法，但其中很少有打算聚集在两个或更多功能之间共享相同相交的数据。本文提出了一种方法。这种新颖方法的主要思想是使用模糊c均值（FCM）算法生成模糊的数据簇。第二部分涉及应用一个选择一系列最小和最大成员价值的过滤器，从而强调边框数据。 μ参数定义了该范围的幅度。它最终使用FCM生成的成员价值应用K-均值算法。自然，具有相似会员价值的数据将在新的脆皮集群中重新组合。该算法还能够根据Silhouette索引（SI）给出的簇的一致性找到FCM和K-均值算法的最佳簇数。结果是一个数据和集群列表，该数据和集群重新组合共享相同的交叉点，与两个或更多功能相交。 CK均值允许提取自然而然地属于同一群集而是在两个或更多簇的相交的非常相似的数据。该算法也总是发现自己是簇的最佳数量。

Clustering data is a popular feature in the field of unsupervised machine learning. Most algorithms aim to find the best method to extract consistent clusters of data, but very few of them intend to cluster data that share the same intersections between two features or more. This paper proposes a method to do so. The main idea of this novel method is to generate fuzzy clusters of data using a Fuzzy C-Means (FCM) algorithm. The second part involves applying a filter that selects a range of minimum and maximum membership values, emphasizing the border data. A μ parameter defines the amplitude of this range. It finally applies a k-means algorithm using the membership values generated by the FCM. Naturally, the data having similar membership values will regroup in a new crispy cluster. The algorithm is also able to find the optimal number of clusters for the FCM and the k-means algorithm, according to the consistency of the clusters given by the Silhouette Index (SI). The result is a list of data and clusters that regroup data sharing the same intersection, intersecting two features or more. ck-means allows extracting the very similar data that does not naturally fall in the same cluster but at the intersection of two clusters or more. The algorithm also always finds itself the optimal number of clusters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题