Communication-Efficient k-Means for Edge-Based Machine Learning

被引:1
|
作者
Lu, Hanlin [1 ]
He, Ting [1 ]
Wang, Shiqiang [2 ]
Liu, Changchang [2 ]
Mahdavi, Mehrdad [1 ]
Narayanan, Vijaykrishnan [1 ]
Chan, Kevin S. [3 ]
Pasteris, Stephen [4 ]
机构
[1] Penn State Univ, University Pk, PA 16802 USA
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
[3] Army Res Lab, Adelphi, MD 20783 USA
[4] UCL, London WC1E 6EA, England
基金
美国国家科学基金会;
关键词
k-Means; dimensionality reduction; coreset; random projection; quantization; edge-based machine learning; JOHNSON-LINDENSTRAUSS; CORESETS;
D O I
10.1109/TPDS.2022.3144595
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We consider the problem of computing the k-means centers for a large high-dimensional dataset in the context of edge-based machine learning, where data sources offload machine learning computation to nearby edge servers. k-Means computation is fundamental to many data analytics, and the capability of computing provably accurate k-means centers by leveraging the computation power of the edge servers, at a low communication and computation cost to the data sources, will greatly improve the performance of these analytics. We propose to let the data sources send small summaries, generated by joint dimensionality reduction (DR), cardinality reduction (CR), and quantization (QT), to support approximate k-means computation at reduced complexity and communication cost. By analyzing the complexity, the communication cost, and the approximation error of k-means algorithms based on carefully designed composition of DR/CR/QT methods, we show that: (i) it is possible to compute near-optimal k-means centers at a near-linear complexity and a constant or logarithmic communication cost, (ii) the order of applying DR and CR significantly affects the complexity and the communication cost, and (iii) combining DR/CR methods with a properly configured quantizer can further reduce the communication cost without compromising the other performance metrics. Our theoretical analysis has been validated through experiments based on real datasets.
引用
收藏
页码:2509 / 2523
页数:15
相关论文
共 50 条
  • [41] Communication-Efficient Federated Edge Learning via Optimal Probabilistic Device Scheduling
    Zhang, Maojun
    Zhu, Guangxu
    Wang, Shuai
    Jiang, Jiamo
    Liao, Qing
    Zhong, Caijun
    Cui, Shuguang
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2022, 21 (10) : 8536 - 8551
  • [42] Communication-Efficient Federated Learning for Digital Twin Edge Networks in Industrial IoT
    Lu, Yunlong
    Huang, Xiaohong
    Zhang, Ke
    Maharjan, Sabita
    Zhang, Yan
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (08) : 5709 - 5718
  • [43] SHARE: Shaping Data Distribution at Edge for Communication-Efficient Hierarchical Federated Learning
    Deng, Yongheng
    Lyu, Feng
    Ren, Ju
    Zhang, Yongmin
    Zhou, Yuezhi
    Zhang, Yaoxue
    Yang, Yuanyuan
    [J]. 2021 IEEE 41ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2021), 2021, : 24 - 34
  • [44] High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning
    Du, Yuqing
    Yang, Sheng
    Huang, Kaibin
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 (68) : 2128 - 2142
  • [45] Communication-Efficient Distributed Learning: An Overview
    Cao, Xuanyu
    Basar, Tamer
    Diggavi, Suhas
    Eldar, Yonina C.
    Letaief, Khaled B.
    Poor, H. Vincent
    Zhang, Junshan
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 851 - 873
  • [46] MLK-means - A hybrid machine learning based k-means clustering algorithm for document clustering
    Perumal, Pitchandi
    Nedunchezhian, Raju
    [J]. International Journal of Computer Science Issues, 2012, 9 (5 5-2): : 164 - 173
  • [47] Communication-Efficient Federated Learning and Permissioned Blockchain for Digital Twin Edge Networks
    Lu, Yunlong
    Huang, Xiaohong
    Zhang, Ke
    Maharjan, Sabita
    Zhang, Yan
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (04) : 2276 - 2288
  • [48] Joint Model Pruning and Device Selection for Communication-Efficient Federated Edge Learning
    Liu, Shengli
    Yu, Guanding
    Yin, Rui
    Yuan, Jiantao
    Shen, Lei
    Liu, Chonghe
    [J]. IEEE TRANSACTIONS ON COMMUNICATIONS, 2022, 70 (01) : 231 - 244
  • [49] Efficient k-Means on GPUs
    Lutz, Clemens
    Bress, Sebastian
    Rabl, Tilmann
    Zeuch, Steffen
    Markl, Volker
    [J]. 14TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE (DAMON 2018), 2018,
  • [50] Robust and Communication-Efficient Collaborative Learning
    Reisizadeh, Amirhossein
    Taheri, Hossein
    Mokhtari, Aryan
    Hassani, Hamed
    Pedarsani, Ramtin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32