Distributed k-means Clustering with Low Transmission Cost

被引:5
|
作者
Naldi, Murilo Coelho [1 ]
Gabrielli Barreto Campello, Ricardo Jose [2 ]
机构
[1] Fed Univ Vicosa UFV, Dept Exact & Technol Sci, Rio Paranaiba, Brazil
[2] Univ Sao Paulo, Inst Math & Comp Sci, Sao Paulo, Brazil
基金
巴西圣保罗研究基金会;
关键词
clustering; k-means; distributed data sets; low data transfer; EFFICIENCY; ALGORITHM;
D O I
10.1109/BRACIS.2013.20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dealing with big amounts of data is one of the challenges for clustering, which causes the need for distribution of large data sets in separate repositories. However, most clustering techniques require the data to be centralized. One of them, the k-means, has been elected one of the most influential data mining algorithms. Although exact distributed versions of the k-means algorithm have been proposed, the algorithm is still sensitive to the selection of the initial cluster prototypes and requires that the number of clusters be specified in advance. Additionally, distributed versions of clustering algorithms usually requires multiple rounds of data transmission. This work tackles the problem of generating an approximated model for distributed clustering, based on k-means, for scenarios where the number of clusters of the distributed data is unknown and the data transmission rate is low or costly. A collection of algorithms is proposed to combine k-means clustering for each distributed subset of the data with a single round of communication. These algorithms are compared from two perspectives: the theoretical one, through asymptotic complexity analyses; and the experimental one, through a comparative evaluation of results obtained from experiments and statistical tests.
引用
收藏
页码:70 / 75
页数:6
相关论文
共 50 条
  • [1] K-Means Clustering with Distributed Dimensions
    Ding, Hu
    Liu, Yu
    Huang, Lingxiao
    Li, Jian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] Automatic Determination of K in Distributed K-Means Clustering
    Kotary, Dinesh Kumar
    Nanda, Satyasai Jagannath
    2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 : 556 - 564
  • [3] Distributed Clustering Based on K-means and CPGA
    Zhou, Jun
    Liu, Zhijing
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 444 - 447
  • [4] Conceptualized phrase clustering with distributed k-means
    Anoop, V. S.
    Asharaf, S.
    INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2019, 13 (02): : 153 - 160
  • [5] Distributed Finite-Time k-means Clustering with Quantized Communucation and Transmission Stopping
    Rikos, Apostolos I.
    Oliva, Gabriele
    Hadjicostis, Christoforos N.
    Johansson, Karl H.
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 518 - 524
  • [6] A distributed framework for trimmed Kernel k-Means clustering
    Tsapanos, Nikolaos
    Tefas, Anastasios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    PATTERN RECOGNITION, 2015, 48 (08) : 2685 - 2698
  • [7] Comparison of distributed evolutionary k-means clustering algorithms
    Naldi, M. C.
    Campello, R. J. G. B.
    NEUROCOMPUTING, 2015, 163 : 78 - 93
  • [8] Private Distributed K-Means Clustering on Interval Data
    Huang, Dingquan
    Yao, Xin
    An, Senquan
    Ren, Shengbing
    2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC), 2021,
  • [9] Research on k-means Clustering Algorithm An Improved k-means Clustering Algorithm
    Shi Na
    Liu Xumin
    Guan Yong
    2010 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY AND SECURITY INFORMATICS (IITSI 2010), 2010, : 63 - 67
  • [10] Distributed K-Means clustering guaranteeing local differential privacy
    Xia, Chang
    Hua, Jingyu
    Tong, Wei
    Zhong, Sheng
    COMPUTERS & SECURITY, 2020, 90