Distributed k-means Clustering with Low Transmission Cost

被引：5

作者：

Naldi, Murilo Coelho ^{[1
]}

Gabrielli Barreto Campello, Ricardo Jose ^{[2
]}

机构：

[1] Fed Univ Vicosa UFV, Dept Exact & Technol Sci, Rio Paranaiba, Brazil

[2] Univ Sao Paulo, Inst Math & Comp Sci, Sao Paulo, Brazil

来源：

2013 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS) | 2013年

基金：

巴西圣保罗研究基金会;

关键词：

clustering; k-means; distributed data sets; low data transfer; EFFICIENCY; ALGORITHM;

D O I：

10.1109/BRACIS.2013.20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.

引用

页码：70 / 75

页数：6

共 50 条

[21] Transformed K-means Clustering
Goel, Anurag
Majumdar, Angshul
29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1526 - 1530
[22] On autonomous k-means clustering
Elomaa, T
Koivistoinen, H
FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, 3488 : 228 - 236
[23] On the Optimality of k-means Clustering
Dalton, Lori A.
2013 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS 2013), 2013, : 70 - 71
[24] Balanced K-Means for Clustering
Malinen, Mikko I.
Franti, Pasi
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2014, 8621 : 32 - 41
[25] Discriminative k-Means Clustering
Arandjelovic, Ognjen
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
[26] Subspace K-means clustering
Timmerman, Marieke E.
Ceulemans, Eva
De Roover, Kim
Van Leeuwen, Karla
BEHAVIOR RESEARCH METHODS, 2013, 45 (04) : 1011 - 1023
[27] Spherical k-Means Clustering
Hornik, Kurt
Feinerer, Ingo
Kober, Martin
Buchta, Christian
JOURNAL OF STATISTICAL SOFTWARE, 2012, 50 (10): : 1 - 22
[28] K-Means Clustering Explained
Emerson, Robert Wall
JOURNAL OF VISUAL IMPAIRMENT & BLINDNESS, 2024, 118 (01) : 65 - 66
[29] Power k-Means Clustering
Xu, Jason
Lange, Kenneth
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[30] Subspace K-means clustering
Marieke E. Timmerman
Eva Ceulemans
Kim De Roover
Karla Van Leeuwen
Behavior Research Methods, 2013, 45 : 1011 - 1023

← 1 2 3 4 5 →