Distributed k-means Clustering with Low Transmission Cost

被引:5
|
作者
Naldi, Murilo Coelho [1 ]
Gabrielli Barreto Campello, Ricardo Jose [2 ]
机构
[1] Fed Univ Vicosa UFV, Dept Exact & Technol Sci, Rio Paranaiba, Brazil
[2] Univ Sao Paulo, Inst Math & Comp Sci, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
clustering; k-means; distributed data sets; low data transfer; EFFICIENCY; ALGORITHM;
D O I
10.1109/BRACIS.2013.20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [21] Transformed K-means Clustering
    Goel, Anurag
    Majumdar, Angshul
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1526 - 1530
  • [22] On autonomous k-means clustering
    Elomaa, T
    Koivistoinen, H
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, 3488 : 228 - 236
  • [23] On the Optimality of k-means Clustering
    Dalton, Lori A.
    2013 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS 2013), 2013, : 70 - 71
  • [24] Balanced K-Means for Clustering
    Malinen, Mikko I.
    Franti, Pasi
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2014, 8621 : 32 - 41
  • [25] Discriminative k-Means Clustering
    Arandjelovic, Ognjen
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [26] Subspace K-means clustering
    Timmerman, Marieke E.
    Ceulemans, Eva
    De Roover, Kim
    Van Leeuwen, Karla
    BEHAVIOR RESEARCH METHODS, 2013, 45 (04) : 1011 - 1023
  • [27] Spherical k-Means Clustering
    Hornik, Kurt
    Feinerer, Ingo
    Kober, Martin
    Buchta, Christian
    JOURNAL OF STATISTICAL SOFTWARE, 2012, 50 (10): : 1 - 22
  • [28] K-Means Clustering Explained
    Emerson, Robert Wall
    JOURNAL OF VISUAL IMPAIRMENT & BLINDNESS, 2024, 118 (01) : 65 - 66
  • [29] Power k-Means Clustering
    Xu, Jason
    Lange, Kenneth
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [30] Subspace K-means clustering
    Marieke E. Timmerman
    Eva Ceulemans
    Kim De Roover
    Karla Van Leeuwen
    Behavior Research Methods, 2013, 45 : 1011 - 1023