Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

被引:44
|
作者
Di Fatta, Giuseppe [1 ]
Blasa, Francesco [2 ]
Cafiero, Simone [2 ]
Fortino, Giancarlo [2 ]
机构
[1] Univ Reading, Sch Syst Engn, Reading, Berks, England
[2] Univ Calabria, Dipartimento Elettron Informat & Sistemist, I-87030 Commenda Di Rende, Italy
关键词
Distributed clustering; K-Means; Peer-to-peer data mining; Gossip protocols; Epidemic protocols; Extreme scale computing;
D O I
10.1016/j.jpdc.2012.09.009
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale. (C) 2012 Elsevier Inc. All rights reserved.
引用
收藏
页码:317 / 329
页数:13
相关论文
共 50 条
  • [1] Scalable k-means for large-scale clustering
    Ming, Yuewei
    Zhu, En
    Wang, Mao
    Liu, Qiang
    Liu, Xinwang
    Yin, Jianping
    [J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 825 - 838
  • [2] Compressed K-Means for Large-Scale Clustering
    Shen, Xiaobo
    Liu, Weiwei
    Tsang, Ivor
    Shen, Fumin
    Sun, Quan-Sen
    [J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2527 - 2533
  • [3] Large-scale k-means clustering via variance reduction
    Zhao, Yawei
    Ming, Yuewei
    Liu, Xinwang
    Zhu, En
    Zhao, Kaikai
    Yin, Jianping
    [J]. NEUROCOMPUTING, 2018, 307 : 184 - 194
  • [4] Regularized and Sparse Stochastic K-Means for Distributed Large-Scale Clustering
    Jumutc, Vilen
    Langone, Rocco
    Suykens, Johan A. K.
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2535 - 2540
  • [5] Fast K-means for Large Scale Clustering
    Hu, Qinghao
    Wu, Jiaxiang
    Bai, Lu
    Zhang, Yifan
    Cheng, Jian
    [J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2099 - 2102
  • [6] Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering
    Ti-Hon Nguyen
    Thanh-Nghi Do
    [J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 737 - 746
  • [7] Large-scale k-means clustering with user-centric privacy preservation
    Sakuma, Jun
    Kobayashi, Shigenobu
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 320 - 332
  • [8] K-Means Spreading Factor Allocation for Large-Scale LoRa Networks
    Ullah, Muhammad Asad
    Iqbal, Junnaid
    Hoeller, Arliones
    Souza, Richard Demo
    Alves, Hirley
    [J]. SENSORS, 2019, 19 (21)
  • [9] A SAMPLING APPROXIMATION FOR LARGE-SCALE K-MEANS
    Phoungphol, Piyaphol
    [J]. ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 324 - 327
  • [10] A large scale clustering scheme for kernel K-Means
    Zhang, R
    Rudnicky, AI
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 289 - 292