Greedy centroid initialization for federated K-means

被引:0
|
作者
Yang, Kun [1 ]
Amiri, Mohammad Mohammadi [2 ]
Kulkarni, Sanjeev R. [1 ]
机构
[1] Princeton Univ, 98 Charlton St, Princeton, NJ 08540 USA
[2] Rensselaer Polytech Inst, 110 8th St, Troy, NY 12180 USA
关键词
K-means; Clustering; Federated learning; Machine learning; SECURITY; PRIVACY;
D O I
10.1007/s10115-024-02066-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study learning from unlabeled data distributed across clients in a federated fashion where raw data do not leave the corresponding devices. We develop a K-means clustering algorithm within this federated setting where the local datasets are clustered at the clients, and a server generates the global clusters after aggregating the local ones. Given the importance of initialization on the federated K-means algorithm (FKM), our objective is to find better initial centroids by utilizing the local data stored on each client. To this end, we start the centroid initialization at the clients, rather than at the server, since the server initially lacks any preliminary insight into the clients' data. The clients first select their local initial clusters and subsequently share their clustering information (including cluster centroids and sizes)with the server. The server then employs a greedy algorithm to determine the global initial centroids based on the information received from the clients. We refer to this idea as G-FKM. Numerical results obtained from both synthetic and public datasets demonstrate that our pro-posed algorithm demonstrates accelerated convergence, exhibiting reduced within-cluster sum of squares (WCSS) and higher adjusted Rand Index compared to three distinct federated K-means variants. This improvement comes at a relatively low cost of sending limited additional information from the clients to the server, rather than conducting the initialization entirely at the server. Furthermore, we have also observed that the proposed algorithm performs better than the centralized algorithm for cases where the data distribution across the clients is highly skewed
引用
收藏
页码:3393 / 3425
页数:33
相关论文
共 50 条
  • [1] Density K-means : A New Algorithm for Centers Initialization for K-means
    Lan, Xv
    Li, Qian
    Zheng, Yi
    [J]. PROCEEDINGS OF 2015 6TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE, 2015, : 958 - 961
  • [2] An Initialization Scheme for Supervized K-means
    Lemaire, Vincent
    Ismaili, Oumaima Alaoui
    Cornuejols, Antoine
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [3] Spectral method of K-means initialization
    Qian, Xian
    Huang, Xuan-Jing
    Wu, Li-De
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2007, 33 (04): : 342 - 346
  • [4] Importance of Initialization in K-Means Clustering
    Gupta, Anubhav
    Tomer, Antriksh
    Dahiya, Sonika
    [J]. 2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,
  • [5] Centroid Update Approach to K-Means Clustering
    Borlea, Ioan-Daniel
    Precup, Radu-Emil
    Dragan, Florin
    Borlea, Alexandra-Bianca
    [J]. ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2017, 17 (04) : 3 - 10
  • [6] Hierarchical initialization approach for K-Means clustering
    Lu, J. F.
    Tang, J. B.
    Tang, Z. M.
    Yang, J. Y.
    [J]. PATTERN RECOGNITION LETTERS, 2008, 29 (06) : 787 - 795
  • [7] Stable Initialization Scheme for K-Means Clustering
    XU Junling1
    2. State Key Laboratory of Software Engineering
    3. Department of Computer
    [J]. Wuhan University Journal of Natural Sciences, 2009, 14 (01) : 24 - 28
  • [8] Adaptive Initialization Method for K-Means Algorithm
    Yang, Jie
    Wang, Yu-Kai
    Yao, Xin
    Lin, Chin-Teng
    [J]. FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
  • [9] EFFECTIVE INITIALIZATION OF K-MEANS FOR COLOR QUANTIZATION
    Celebi, M. Emre
    [J]. 2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 1649 - 1652
  • [10] Improving spherical k-means for document clustering: Fast initialization, sparse centroid projection, and efficient cluster labeling
    Kim, Hyunjoong
    Kim, Han Kyul
    Cho, Sungzoon
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 150