Greedy centroid initialization for federated K-means

被引：0

作者：

Yang, Kun ^{[1
]}

Amiri, Mohammad Mohammadi ^{[2
]}

Kulkarni, Sanjeev R. ^{[1
]}

机构：

[1] Princeton Univ, 98 Charlton St, Princeton, NJ 08540 USA

[2] Rensselaer Polytech Inst, 110 8th St, Troy, NY 12180 USA

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2024年 / 66卷 / 06期

关键词：

K-means; Clustering; Federated learning; Machine learning; SECURITY; PRIVACY;

D O I：

10.1007/s10115-024-02066-x

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study learning from unlabeled data distributed across clients in a federated fashion where raw data do not leave the corresponding devices. We develop a K-means clustering algorithm within this federated setting where the local datasets are clustered at the clients, and a server generates the global clusters after aggregating the local ones. Given the importance of initialization on the federated K-means algorithm (FKM), our objective is to find better initial centroids by utilizing the local data stored on each client. To this end, we start the centroid initialization at the clients, rather than at the server, since the server initially lacks any preliminary insight into the clients' data. The clients first select their local initial clusters and subsequently share their clustering information (including cluster centroids and sizes)with the server. The server then employs a greedy algorithm to determine the global initial centroids based on the information received from the clients. We refer to this idea as G-FKM. Numerical results obtained from both synthetic and public datasets demonstrate that our pro-posed algorithm demonstrates accelerated convergence, exhibiting reduced within-cluster sum of squares (WCSS) and higher adjusted Rand Index compared to three distinct federated K-means variants. This improvement comes at a relatively low cost of sending limited additional information from the clients to the server, rather than conducting the initialization entirely at the server. Furthermore, we have also observed that the proposed algorithm performs better than the centralized algorithm for cases where the data distribution across the clients is highly skewed

引用

下载

页码：3393 / 3425

页数：33

共 50 条

[21] K-Means Genetic Algorithms with Greedy Genetic Operators
Kazakovtsev, Lev
Rozhnov, Ivan
Shkaberina, Guzel
Orlov, Viktor
MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
[22] Centronit: Initial Centroid Designation Algorithm for K-Means Clustering
Barakbah, Ali Ridho
Arai, Kohei
EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2014, 2 (01) : 50 - 62
[23] A Quality Metric for K-Means Clustering Based on Centroid Locations
Thulasidas, Manoj
ADVANCED DATA MINING AND APPLICATIONS, ADMA 2022, PT II, 2022, 13726 : 208 - 222
[24] Combining Statistical Information and Distance Computation for K-Means Initialization
Du, Wei
Lin, Hu
Sun, Jianwei
Yu, Bo
Yang, Haibo
PROCEEDINGS OF 2016 12TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2016, : 97 - 102
[25] Min–max kurtosis mean distance based k-means initial centroid initialization method for big genomic data clustering
Kamlesh Kumar Pandey
Diwakar Shukla
Evolutionary Intelligence, 2023, 16 : 1055 - 1076
[26] An empirical comparison of four initialization methods for the K-Means algorithm
Peña, JM
Lozano, JA
Larrañaga, P
PATTERN RECOGNITION LETTERS, 1999, 20 (10) : 1027 - 1040
[27] Statistical initialization of intrinsic K-means clustering on homogeneous manifolds
Tan, Chao
Zhao, Huan
Ding, Han
APPLIED INTELLIGENCE, 2023, 53 (05) : 4959 - 4978
[28] Statistical initialization of intrinsic K-means clustering on homogeneous manifolds
Chao Tan
Huan Zhao
Han Ding
Applied Intelligence, 2023, 53 : 4959 - 4978
[29] K-Means Initialization Methods for Improving Clustering by Simulated Annealing
Perim, Gabriela Trazzi
Wandekokem, Estefhan Dazzi
Varejao, Flavio Miguel
ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2008, PROCEEDINGS, 2008, 5290 : 133 - 142
[30] An initialization method of K-means clustering algorithm for mixed data
Li, Taoying, 1873, ICIC International (10):

← 1 2 3 4 5 →