Greedy centroid initialization for federated K-means

被引:0
|
作者
Yang, Kun [1 ]
Amiri, Mohammad Mohammadi [2 ]
Kulkarni, Sanjeev R. [1 ]
机构
[1] Princeton Univ, 98 Charlton St, Princeton, NJ 08540 USA
[2] Rensselaer Polytech Inst, 110 8th St, Troy, NY 12180 USA
关键词
K-means; Clustering; Federated learning; Machine learning; SECURITY; PRIVACY;
D O I
10.1007/s10115-024-02066-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study learning from unlabeled data distributed across clients in a federated fashion where raw data do not leave the corresponding devices. We develop a K-means clustering algorithm within this federated setting where the local datasets are clustered at the clients, and a server generates the global clusters after aggregating the local ones. Given the importance of initialization on the federated K-means algorithm (FKM), our objective is to find better initial centroids by utilizing the local data stored on each client. To this end, we start the centroid initialization at the clients, rather than at the server, since the server initially lacks any preliminary insight into the clients' data. The clients first select their local initial clusters and subsequently share their clustering information (including cluster centroids and sizes)with the server. The server then employs a greedy algorithm to determine the global initial centroids based on the information received from the clients. We refer to this idea as G-FKM. Numerical results obtained from both synthetic and public datasets demonstrate that our pro-posed algorithm demonstrates accelerated convergence, exhibiting reduced within-cluster sum of squares (WCSS) and higher adjusted Rand Index compared to three distinct federated K-means variants. This improvement comes at a relatively low cost of sending limited additional information from the clients to the server, rather than conducting the initialization entirely at the server. Furthermore, we have also observed that the proposed algorithm performs better than the centralized algorithm for cases where the data distribution across the clients is highly skewed
引用
下载
收藏
页码:3393 / 3425
页数:33
相关论文
共 50 条
  • [41] AN INITIALIZATION METHOD OF K-MEANS CLUSTERING ALGORITHM FOR MIXED DATA
    Li, Taoying
    Jin, Zhihong
    Chen, Yan
    Ebonzo, Angelo Dan Menga
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2014, 10 (05): : 1873 - 1883
  • [42] AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA
    Naeini, A. Alizade
    Jamshidzadeh, A.
    Saadatseresht, M.
    Homayouni, S.
    1ST ISPRS INTERNATIONAL CONFERENCE ON GEOSPATIAL INFORMATION RESEARCH, 2014, 40 (2/W3): : 35 - 39
  • [43] A greedy randomized adaptive search procedure applied to the clustering problem as an initialization process using K-Means as a local search procedure
    Cano, JR
    Cordón, O
    Herrera, F
    Sánchez, L
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2002, 12 (3-4) : 235 - 242
  • [44] Min-max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
    Pandey, Kamlesh Kumar
    Shukla, Diwakar
    EVOLUTIONARY INTELLIGENCE, 2023, 16 (03) : 1055 - 1076
  • [45] K-means-sharp: modified centroid update for outlier-robust k-means clustering
    Olukanmi, Peter O.
    Twala, Blhekisipho
    2017 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS (PRASA-ROBMECH), 2017, : 14 - 19
  • [46] Initial Centroid Selection Method for an Enhanced K-means Clustering Algorithm
    Aamer, Youssef
    Benkaouz, Yahya
    Ouzzif, Mohammed
    Bouragba, Khalid
    UBIQUITOUS NETWORKING, UNET 2019, 2020, 12293 : 182 - 190
  • [47] An Empirical Study on Initializing Centroid in K-Means Clustering for Feature Selection
    Saxena, Amit
    Wang, John
    Sintunavarat, Wutiphol
    INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2021, 13 (01): : 1 - 16
  • [48] Centroid stability with K-Means Fast Learning Artificial Neural Networks
    Ping, WL
    Phuan, ATL
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2003, VOLS 1-4, 2003, : 1517 - 1522
  • [49] Centroid Selection in Kernel Extreme Learning Machine using K-means
    Singhal, Mona
    Shukla, Sanyam
    2018 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2018, : 708 - 711
  • [50] LeaderRank based k-means clustering initialization method for collaborative filtering
    Kant, Surya
    Mahara, Tripti
    Jain, Vinay Kumar
    Jai, Deepak Kumar
    Sangaiah, Arun Kumar
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 69 : 598 - 609