Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

被引：44

作者：

Di Fatta, Giuseppe ^{[1
]}

Blasa, Francesco ^{[2
]}

Cafiero, Simone ^{[2
]}

Fortino, Giancarlo ^{[2
]}

机构：

[1] Univ Reading, Sch Syst Engn, Reading, Berks, England

[2] Univ Calabria, Dipartimento Elettron Informat & Sistemist, I-87030 Commenda Di Rende, Italy

来源：

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING | 2013年 / 73卷 / 03期

关键词：

Distributed clustering; K-Means; Peer-to-peer data mining; Gossip protocols; Epidemic protocols; Extreme scale computing;

D O I：

10.1016/j.jpdc.2012.09.009

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The K-Means algorithm for cluster analysis is one of the most influential and popular data mining methods. Its straightforward parallel formulation is well suited for distributed memory systems with reliable interconnection networks, such as massively parallel processors and clusters of workstations. However, in large-scale geographically distributed systems the straightforward parallel algorithm can be rendered useless by a single communication failure or high latency in communication paths. The lack of scalable and fault tolerant global communication and synchronisation methods in large-scale systems has hindered the adoption of the K-Means algorithm for applications in large networked systems such as wireless sensor networks, peer-to-peer systems and mobile ad hoc networks. This work proposes a fully distributed K-Means algorithm (Epidemic K-Means) which does not require global communication and is intrinsically fault tolerant. The proposed distributed K-Means algorithm provides a clustering solution which can approximate the solution of an ideal centralised algorithm over the aggregated data as closely as desired. A comparative performance analysis is carried out against the state of the art sampling methods and shows that the proposed method overcomes the limitations of the sampling-based approaches for skewed clusters distributions. The experimental analysis confirms that the proposed algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale. (C) 2012 Elsevier Inc. All rights reserved.

引用

页码：317 / 329

页数：13

共 50 条

[1] Scalable k-means for large-scale clustering
Ming, Yuewei
Zhu, En
Wang, Mao
Liu, Qiang
Liu, Xinwang
Yin, Jianping
[J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (04) : 825 - 838
[2] Compressed K-Means for Large-Scale Clustering
Shen, Xiaobo
Liu, Weiwei
Tsang, Ivor
Shen, Fumin
Sun, Quan-Sen
[J]. THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2527 - 2533
[3] Large-scale k-means clustering via variance reduction
Zhao, Yawei
Ming, Yuewei
Liu, Xinwang
Zhu, En
Zhao, Kaikai
Yin, Jianping
[J]. NEUROCOMPUTING, 2018, 307 : 184 - 194
[4] Regularized and Sparse Stochastic K-Means for Distributed Large-Scale Clustering
Jumutc, Vilen
Langone, Rocco
Suykens, Johan A. K.
[J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 2535 - 2540
[5] Fast K-means for Large Scale Clustering
Hu, Qinghao
Wu, Jiaxiang
Bai, Lu
Zhang, Yifan
Cheng, Jian
[J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2099 - 2102
[6] Extractive Text Summarization on Large-scale Dataset Using K-Means Clustering
Ti-Hon Nguyen
Thanh-Nghi Do
[J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 737 - 746
[7] Large-scale k-means clustering with user-centric privacy preservation
Sakuma, Jun
Kobayashi, Shigenobu
[J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2008, 5012 : 320 - 332
[8] K-Means Spreading Factor Allocation for Large-Scale LoRa Networks
Ullah, Muhammad Asad
Iqbal, Junnaid
Hoeller, Arliones
Souza, Richard Demo
Alves, Hirley
[J]. SENSORS, 2019, 19 (21)
[9] A SAMPLING APPROXIMATION FOR LARGE-SCALE K-MEANS
Phoungphol, Piyaphol
[J]. ICAART: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1, 2012, : 324 - 327
[10] A large scale clustering scheme for kernel K-Means
Zhang, R
Rudnicky, AI
[J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITON, VOL IV, PROCEEDINGS, 2002, : 289 - 292

← 1 2 3 4 5 →