Two-Stage Clustering for Federated Learning with Pseudo Mini-batch SGD Training on Non-IID Data

被引：0

作者：

Weng, Jianqing ^{[1
]}

Su, Songzhi ^{[1
]}

Fan, Xiaoliang ^{[1
]}

机构：

[1] Xiamen Univ, Xiamen 361005, Peoples R China

来源：

COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, CHINESECSCW 2021, PT I | 2022年 / 1491卷

关键词：

Federated learning; Clustering; Non-IID data;

D O I：

10.1007/978-981-19-4546-5_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Statistical heterogeneity problem in federated learning is mainly caused by the skewness of the data distribution among clients. In this paper, we first discover a connection between the discrepancy of data distributions and their model divergence. Based on this insight, we introduce a K-center clustering method to build client groups by the similarity of their local updating parameters, which can effectively reduce the data distribution skewness. Secondly, this paper provides a theoretical proof that a more uniform data distribution of clients in training can reduce the growth of model divergence thereby improving the training performance on Non-IID environment. Therefore, we randomly divide the clients of each cluster in the first stage into multiple fine-grained clusters to flatten the original data distribution. Finally, to fully leverage the data in each fine-grained cluster for training, we proposed an intra-cluster training method named pseudo mini-batch SGD training. This method can conduct general mini-batch SGD training on each fine-grained cluster with data kept locally. With the two-stage clustering mechanism, the negative effect of Non-IID data can be steadily eliminated. Experiments on two federated learning benchmarks i.e. FEMNIST and CelebA, as well as a manually setting Non-IID dataset using CIFAR10 show that our proposed method significantly improves training efficiency on Non-IID data and outperforms several widely-used federated baselines.

引用

下载

页码：29 / 43

页数：15

共 50 条

[41] Is Non-IID Data a Threat in Federated Online Learning to Rank?
Wang, Shuyi
Zuccon, Guido
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2801 - 2813
[42] FedRL: Improving the Performance of Federated Learning with Non-IID Data
Kang, Yufei
Li, Baochun
Zeyl, Timothy
2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022), 2022, : 3023 - 3028
[43] Feature Matching Data Synthesis for Non-IID Federated Learning
Li, Zijian
Sun, Yuchang
Shao, Jiawei
Mao, Yuyi
Wang, Jessie Hui
Zhang, Jun
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (10) : 9352 - 9367
[44] FedKT: Federated learning with knowledge transfer for non-IID data
Yu, Bin (yubin@mail.xidian.edu.cn), 2025, 159
[45] Data independent warmup scheme for non-IID federated learning
Arafeh, Mohamad
Ould-Slimane, Hakima
Otrok, Hadi
Mourad, Azzam
Talhi, Chamseddine
Damiani, Ernesto
INFORMATION SCIENCES, 2023, 623 : 342 - 360
[46] FedPD: A Federated Learning Framework With Adaptivity to Non-IID Data
Zhang, Xinwei
Hong, Mingyi
Dhople, Sairaj
Yin, Wotao
Liu, Yang
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 (69) : 6055 - 6070
[47] Heterogeneous Federated Learning for Non-IID Smartwatch Data Classification
Syu J.
Lin J.C.
IEEE Internet of Things Journal, 2024, 11 (18) : 1 - 1
[48] Ensemble Federated Learning With Non-IID Data in Wireless Networks
Zhao, Zhongyuan
Wang, Jingyi
Hong, Wei
Quek, Tony Q. S.
Ding, Zhiguo
Peng, Mugen
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2024, 23 (04) : 3557 - 3571
[49] Privacy-Enhanced Federated Learning for Non-IID Data
Tan, Qingjie
Wu, Shuhui
Tao, Yuanhong
MATHEMATICS, 2023, 11 (19)
[50] Adaptive Federated Learning on Non-IID Data With Resource Constraint
Zhang, Jie
Guo, Song
Qu, Zhihao
Zeng, Deze
Zhan, Yufeng
Liu, Qifeng
Akerkar, Rajendra
IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (07) : 1655 - 1667

← 1 2 3 4 5 →