Distributed k-means Clustering with Low Transmission Cost

被引:5
|
作者
Naldi, Murilo Coelho [1 ]
Gabrielli Barreto Campello, Ricardo Jose [2 ]
机构
[1] Fed Univ Vicosa UFV, Dept Exact & Technol Sci, Rio Paranaiba, Brazil
[2] Univ Sao Paulo, Inst Math & Comp Sci, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
clustering; k-means; distributed data sets; low data transfer; EFFICIENCY; ALGORITHM;
D O I
10.1109/BRACIS.2013.20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [41] The Research of the Distributed Resource-Aware K-means Clustering Algorithm
    Wang, Xiaoni
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2015, 19 (03) : 343 - 348
  • [42] Low Dose CT Perfusion using K-Means Clustering
    Pisana, Francesco
    Henzler, Thomas
    Schoenberg, Stefan
    Klotz, Ernst
    Schmidt, Bernhard
    Kachelriess, Marc
    MEDICAL IMAGING 2016: PHYSICS OF MEDICAL IMAGING, 2016, 9783
  • [43] Dimensionality Reduction for k-Means Clustering and Low Rank Approximation
    Cohen, Michael B.
    Elder, Sam
    Musco, Cameron
    Musco, Christopher
    Persu, Madalina
    STOC'15: PROCEEDINGS OF THE 2015 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2015, : 163 - 172
  • [44] Soil data clustering by using K-means and fuzzy K-means algorithm
    Hot, Elma
    Popovic-Bugarin, Vesna
    2015 23RD TELECOMMUNICATIONS FORUM TELFOR (TELFOR), 2015, : 890 - 893
  • [45] Empirical Evaluation of K-Means, Bisecting K-Means, Fuzzy C-Means and Genetic K-Means Clustering Algorithms
    Banerjee, Shreya
    Choudhary, Ankit
    Pal, Somnath
    2015 IEEE INTERNATIONAL WIE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE), 2015, : 172 - 176
  • [46] k*-means:: A new generalized k-means clustering algorithm
    Cheung, YM
    PATTERN RECOGNITION LETTERS, 2003, 24 (15) : 2883 - 2893
  • [47] K*-Means: An Effective and Efficient K-means Clustering Algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016), 2016, : 242 - 249
  • [48] An Improved K-means Clustering Algorithm
    Wang Yintong
    Li Wanlong
    Gao Rujia
    2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [49] Granular K-means Clustering Algorithm
    Zhou, Chenglong
    Chen, Yuming
    Zhu, Yidong
    Computer Engineering and Applications, 2023, 59 (13) : 317 - 324
  • [50] Unsupervised K-Means Clustering Algorithm
    Sinaga, Kristina P.
    Yang, Miin-Shen
    IEEE ACCESS, 2020, 8 : 80716 - 80727