Distributed k-means Clustering with Low Transmission Cost

被引:5
|
作者
Naldi, Murilo Coelho [1 ]
Gabrielli Barreto Campello, Ricardo Jose [2 ]
机构
[1] Fed Univ Vicosa UFV, Dept Exact & Technol Sci, Rio Paranaiba, Brazil
[2] Univ Sao Paulo, Inst Math & Comp Sci, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
clustering; k-means; distributed data sets; low data transfer; EFFICIENCY; ALGORITHM;
D O I
10.1109/BRACIS.2013.20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [31] k-means clustering of extremes
    Janssen, Anja
    Wan, Phyllis
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (01): : 1211 - 1233
  • [32] K-means clustering on CGRA
    Lopes, Joao D.
    de Sousa, Jose T.
    Neto, Horacio
    Vestias, Mario
    2017 27TH INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS (FPL), 2017,
  • [33] Online k-means Clustering
    Cohen-Addad, Vincent
    Guedj, Benjamin
    Kanade, Varun
    Rom, Guy
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [34] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [35] Deep k-Means: Jointly clustering with k-Means and learning representations
    Fard, Maziar Moradi
    Thonet, Thibaut
    Gaussier, Eric
    PATTERN RECOGNITION LETTERS, 2020, 138 : 185 - 192
  • [36] PSO Aided k-Means Clustering: Introducing Connectivity in k-Means
    Breaban, Mihaela Elena
    Luchian, Henri
    GECCO-2011: PROCEEDINGS OF THE 13TH ANNUAL GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2011, : 1227 - 1234
  • [37] Distributed threshold k-means clustering for privacy preserving data mining
    Baby, Vadlana
    Chandra, N. Subhash
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2286 - 2289
  • [38] Distributed Algorithm for Text Documents Clustering Based on k-Means Approach
    Sarnovsky, Martin
    Carnoka, Noema
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2015, PT II, 2016, 430 : 165 - 174
  • [39] Fast and exact out-of-core and distributed k-means clustering
    Ruoming Jin
    Anjan Goswami
    Gagan Agrawal
    Knowledge and Information Systems, 2006, 10 : 17 - 40
  • [40] Fast and exact out-of-core and distributed k-means clustering
    Jin, Ruoming
    Goswami, Anjan
    Agrawal, Gagan
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 10 (01) : 17 - 40