Communication-Efficient k-Means for Edge-Based Machine Learning

被引：1

作者：

Lu, Hanlin ^{[1
]}

He, Ting ^{[1
]}

Wang, Shiqiang ^{[2
]}

Liu, Changchang ^{[2
]}

Mahdavi, Mehrdad ^{[1
]}

Narayanan, Vijaykrishnan ^{[1
]}

Chan, Kevin S. ^{[3
]}

Pasteris, Stephen ^{[4
]}

机构：

[1] Penn State Univ, University Pk, PA 16802 USA

[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

[3] Army Res Lab, Adelphi, MD 20783 USA

[4] UCL, London WC1E 6EA, England

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2022年 / 33卷 / 10期

基金：

美国国家科学基金会;

关键词：

k-Means; dimensionality reduction; coreset; random projection; quantization; edge-based machine learning; JOHNSON-LINDENSTRAUSS; CORESETS;

D O I：

10.1109/TPDS.2022.3144595

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR), cardinality reduction (CR), and quantization (QT), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on carefully designed composition of DR/CR/QT methods, we show that: (i) it is possible to compute near-optimal k-means centers at a near-linear complexity and a constant or logarithmic communication cost, (ii) the order of applying DR and CR significantly affects the complexity and the communication cost, and (iii) combining DR/CR methods with a properly configured quantizer can further reduce the communication cost without compromising the other performance metrics. Our theoretical analysis has been validated through experiments based on real datasets.

引用

页码：2509 / 2523

页数：15

共 50 条

[41] Communication-Efficient Federated Edge Learning via Optimal Probabilistic Device Scheduling
Zhang, Maojun
Zhu, Guangxu
Wang, Shuai
Jiang, Jiamo
Liao, Qing
Zhong, Caijun
Cui, Shuguang
[J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (10) : 8536 - 8551
[42] Communication-Efficient Federated Learning for Digital Twin Edge Networks in Industrial IoT
Lu, Yunlong
Huang, Xiaohong
Zhang, Ke
Maharjan, Sabita
Zhang, Yan
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (08) : 5709 - 5718
[43] SHARE: Shaping Data Distribution at Edge for Communication-Efficient Hierarchical Federated Learning
Deng, Yongheng
Lyu, Feng
Ren, Ju
Zhang, Yongmin
Zhou, Yuezhi
Zhang, Yaoxue
Yang, Yuanyuan
[J]. 2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 24 - 34
[44] High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning
Du, Yuqing
Yang, Sheng
Huang, Kaibin
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 (68) : 2128 - 2142
[45] Communication-Efficient Distributed Learning: An Overview
Cao, Xuanyu
Basar, Tamer
Diggavi, Suhas
Eldar, Yonina C.
Letaief, Khaled B.
Poor, H. Vincent
Zhang, Junshan
[J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 851 - 873
[46] MLK-means - A hybrid machine learning based k-means clustering algorithm for document clustering
Perumal, Pitchandi
Nedunchezhian, Raju
[J]. International Journal of Computer Science Issues, 2012, 9 (5 5-2): : 164 - 173
[47] Communication-Efficient Federated Learning and Permissioned Blockchain for Digital Twin Edge Networks
Lu, Yunlong
Huang, Xiaohong
Zhang, Ke
Maharjan, Sabita
Zhang, Yan
[J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (04) : 2276 - 2288
[48] Joint Model Pruning and Device Selection for Communication-Efficient Federated Edge Learning
Liu, Shengli
Yu, Guanding
Yin, Rui
Yuan, Jiantao
Shen, Lei
Liu, Chonghe
[J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 2022, 70 (01) : 231 - 244
[49] Efficient k-Means on GPUs
Lutz, Clemens
Bress, Sebastian
Rabl, Tilmann
Zeuch, Steffen
Markl, Volker
[J]. 14TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE (DAMON 2018), 2018,
[50] Robust and Communication-Efficient Collaborative Learning
Reisizadeh, Amirhossein
Taheri, Hossein
Mokhtari, Aryan
Hassani, Hamed
Pedarsani, Ramtin
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →