Greedy centroid initialization for federated K-means

被引:0
|
作者
Yang, Kun [1 ]
Amiri, Mohammad Mohammadi [2 ]
Kulkarni, Sanjeev R. [1 ]
机构
[1] Princeton Univ, 98 Charlton St, Princeton, NJ 08540 USA
[2] Rensselaer Polytech Inst, 110 8th St, Troy, NY 12180 USA
关键词
K-means; Clustering; Federated learning; Machine learning; SECURITY; PRIVACY;
D O I
10.1007/s10115-024-02066-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study learning from unlabeled data distributed across clients in a federated fashion where raw data do not leave the corresponding devices. We develop a K-means clustering algorithm within this federated setting where the local datasets are clustered at the clients, and a server generates the global clusters after aggregating the local ones. Given the importance of initialization on the federated K-means algorithm (FKM), our objective is to find better initial centroids by utilizing the local data stored on each client. To this end, we start the centroid initialization at the clients, rather than at the server, since the server initially lacks any preliminary insight into the clients' data. The clients first select their local initial clusters and subsequently share their clustering information (including cluster centroids and sizes)with the server. The server then employs a greedy algorithm to determine the global initial centroids based on the information received from the clients. We refer to this idea as G-FKM. Numerical results obtained from both synthetic and public datasets demonstrate that our pro-posed algorithm demonstrates accelerated convergence, exhibiting reduced within-cluster sum of squares (WCSS) and higher adjusted Rand Index compared to three distinct federated K-means variants. This improvement comes at a relatively low cost of sending limited additional information from the clients to the server, rather than conducting the initialization entirely at the server. Furthermore, we have also observed that the proposed algorithm performs better than the centralized algorithm for cases where the data distribution across the clients is highly skewed
引用
下载
收藏
页码:3393 / 3425
页数:33
相关论文
共 50 条
  • [31] A new initialization and performance measure for the rough k-means clustering
    Murugesan, Vijaya Prabhagar
    Murugesan, Punniyamoorthy
    SOFT COMPUTING, 2020, 24 (15) : 11605 - 11619
  • [32] Memory and Communication Efficient Federated Kernel k-Means
    Zhou, Xiaochen
    Wang, Xudong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (05) : 7114 - 7125
  • [33] Davies Bouldin Index based hierarchical initialization K-means
    Xiao, Junwei
    Lu, Jianfeng
    Li, Xiangyu
    INTELLIGENT DATA ANALYSIS, 2017, 21 (06) : 1327 - 1338
  • [34] In Search of a New Initialization of K-Means Clustering for Color Quantization
    Frackiewicz, Mariusz
    Palus, Henryk
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2015), 2015, 9875
  • [35] A new initialization and performance measure for the rough k-means clustering
    Vijaya Prabhagar Murugesan
    Punniyamoorthy Murugesan
    Soft Computing, 2020, 24 : 11605 - 11619
  • [36] An Initialization Method Based on Hybrid Distance for k-Means Algorithm
    Yang, Jie
    Ma, Yan
    Zhang, Xiangfen
    Li, Shunbao
    Zhang, Yuping
    NEURAL COMPUTATION, 2017, 29 (11) : 3094 - 3117
  • [37] DETERMINISTIC INITIALIZATION OF THE K-MEANS ALGORITHM USING HIERARCHICAL CLUSTERING
    Celebi, M. Emre
    Kingravi, Hassan A.
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (07)
  • [38] A New Projection-based K-Means Initialization Algorithm
    Du, Wei
    Lin, Hu
    Sun, Jianwei
    Yu, Bo
    Yang, Haibo
    2016 IEEE CHINESE GUIDANCE, NAVIGATION AND CONTROL CONFERENCE (CGNCC), 2016, : 2341 - 2345
  • [39] AN EFFICIENT K-MEANS CLUSTERING INITIALIZATION USING OPTIMIZATION ALGORITHM
    Divya, V.
    Deepika, R.
    Yamini, C.
    Sobiyaa, P.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATION ENGINEERING (ICACCE-2019), 2019,
  • [40] An initialization method for the K-Means algorithm using neighborhood model
    Cao, Fuyuan
    Liang, Jiye
    Jiang, Guang
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2009, 58 (03) : 474 - 483