Two-Stage Clustering for Federated Learning with Pseudo Mini-batch SGD Training on Non-IID Data

被引:0
|
作者
Weng, Jianqing [1 ]
Su, Songzhi [1 ]
Fan, Xiaoliang [1 ]
机构
[1] Xiamen Univ, Xiamen 361005, Peoples R China
关键词
Federated learning; Clustering; Non-IID data;
D O I
10.1007/978-981-19-4546-5_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Statistical heterogeneity problem in federated learning is mainly caused by the skewness of the data distribution among clients. In this paper, we first discover a connection between the discrepancy of data distributions and their model divergence. Based on this insight, we introduce a K-center clustering method to build client groups by the similarity of their local updating parameters, which can effectively reduce the data distribution skewness. Secondly, this paper provides a theoretical proof that a more uniform data distribution of clients in training can reduce the growth of model divergence thereby improving the training performance on Non-IID environment. Therefore, we randomly divide the clients of each cluster in the first stage into multiple fine-grained clusters to flatten the original data distribution. Finally, to fully leverage the data in each fine-grained cluster for training, we proposed an intra-cluster training method named pseudo mini-batch SGD training. This method can conduct general mini-batch SGD training on each fine-grained cluster with data kept locally. With the two-stage clustering mechanism, the negative effect of Non-IID data can be steadily eliminated. Experiments on two federated learning benchmarks i.e. FEMNIST and CelebA, as well as a manually setting Non-IID dataset using CIFAR10 show that our proposed method significantly improves training efficiency on Non-IID data and outperforms several widely-used federated baselines.
引用
下载
收藏
页码:29 / 43
页数:15
相关论文
共 50 条
  • [41] Is Non-IID Data a Threat in Federated Online Learning to Rank?
    Wang, Shuyi
    Zuccon, Guido
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2801 - 2813
  • [42] FedRL: Improving the Performance of Federated Learning with Non-IID Data
    Kang, Yufei
    Li, Baochun
    Zeyl, Timothy
    2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 3023 - 3028
  • [43] Feature Matching Data Synthesis for Non-IID Federated Learning
    Li, Zijian
    Sun, Yuchang
    Shao, Jiawei
    Mao, Yuyi
    Wang, Jessie Hui
    Zhang, Jun
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9352 - 9367
  • [45] Data independent warmup scheme for non-IID federated learning
    Arafeh, Mohamad
    Ould-Slimane, Hakima
    Otrok, Hadi
    Mourad, Azzam
    Talhi, Chamseddine
    Damiani, Ernesto
    INFORMATION SCIENCES, 2023, 623 : 342 - 360
  • [46] FedPD: A Federated Learning Framework With Adaptivity to Non-IID Data
    Zhang, Xinwei
    Hong, Mingyi
    Dhople, Sairaj
    Yin, Wotao
    Liu, Yang
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 (69) : 6055 - 6070
  • [47] Heterogeneous Federated Learning for Non-IID Smartwatch Data Classification
    Syu J.
    Lin J.C.
    IEEE Internet of Things Journal, 2024, 11 (18) : 1 - 1
  • [48] Ensemble Federated Learning With Non-IID Data in Wireless Networks
    Zhao, Zhongyuan
    Wang, Jingyi
    Hong, Wei
    Quek, Tony Q. S.
    Ding, Zhiguo
    Peng, Mugen
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (04) : 3557 - 3571
  • [49] Privacy-Enhanced Federated Learning for Non-IID Data
    Tan, Qingjie
    Wu, Shuhui
    Tao, Yuanhong
    MATHEMATICS, 2023, 11 (19)
  • [50] Adaptive Federated Learning on Non-IID Data With Resource Constraint
    Zhang, Jie
    Guo, Song
    Qu, Zhihao
    Zeng, Deze
    Zhan, Yufeng
    Liu, Qifeng
    Akerkar, Rajendra
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (07) : 1655 - 1667