Parallel and accurate k-means algorithm on CPU-GPU architectures for spectral clustering

被引:4
|
作者
He, Guanlin [1 ]
Vialle, Stephane [1 ]
Baboulin, Marc [2 ]
机构
[1] Univ Paris Saclay, CentraleSupelec, CNRS, LISN, Orsay, France
[2] Univ Paris Saclay, CNRS, LISN, Orsay, France
来源
关键词
heterogeneous CPU-GPU computing; k-means algorithm; parallel code optimization; spectral clustering; unsupervised machine learning;
D O I
10.1002/cpe.6621
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
k-Means is a standard algorithm for clustering data. It constitutes generally the final step in a more complex chain of high-quality spectral clustering. However, this chain suffers from lack of scalability when addressing large datasets. This can be overcome by applying also the k-means algorithm as a preprocessing task to reduce the input data instances. We propose parallel optimization techniques for the k-means algorithm on CPU and GPU. Particularly we use a two-step summation method with package processing to handle the effect of rounding errors that may occur during the phase of updating cluster centroids. Our experiments on synthetic and real-world datasets containing millions of instances exhibit a speedup up to 7 for the k-means iteration time on GPU versus 20/40 CPU threads using AVX units, and achieve double-precision accuracy with single-precision computations.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Parallelization of the k-means Algorithm in a Spectral Clustering Chain on CPU-GPU Platforms
    He, Guanlin
    Vialle, Stephane
    Baboulin, Marc
    [J]. EURO-PAR 2020: PARALLEL PROCESSING WORKSHOPS, 2021, 12480 : 135 - 147
  • [2] Parallel TNN spectral clustering algorithm in CPU-GPU heterogeneous computing environment
    Zhang, Shuai
    Li, Tao
    Jiao, Xiaofan
    Wang, Yifeng
    Yang, Yulu
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (11): : 2555 - 2567
  • [3] A Comparative study of parallel CPU/GPU implementations of the K-Means Algorithm
    Daoudi, Sara
    Zouaoui, Chakib Mustapha Anouar
    Chikr El-Mezouar, Miloud
    Taleb, Nasreddine
    [J]. 2019 INTERNATIONAL CONFERENCE ON ADVANCED ELECTRICAL ENGINEERING (ICAEE), 2019,
  • [4] Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
    Faqir-Rhazoui, Youssef
    Garcia, Carlos
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (16): : 18480 - 18506
  • [5] Exploring the performance and portability of the k-means algorithm on SYCL across CPU and GPU architectures
    Youssef Faqir-Rhazoui
    Carlos García
    [J]. The Journal of Supercomputing, 2023, 79 : 18480 - 18506
  • [6] Parallel k-means Clustering of Geospatial Data Sets Using Manycore CPU Architectures
    Mills, Richard Tran
    Sripathi, Vamsi
    Kumar, Jitendra
    Sreepathi, Sarat
    Hoffman, Forrest M.
    Hargrove, William W.
    [J]. 2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 787 - 794
  • [7] A GPU-accelerated parallel K-means algorithm
    Cuomo, S.
    De Angelis, V.
    Farina, G.
    Marcellino, L.
    Toraldo, G.
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2019, 75 : 262 - 274
  • [8] Clustering with Spectral Norm and the k-means Algorithm
    Kumar, Amit
    Kannan, Ravindran
    [J]. 2010 IEEE 51ST ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, 2010, : 299 - 308
  • [9] GPU-Based Parallel Implementation of k-means Clustering Algorithm for Image Segmentation
    Karbhari, Shruti
    Alawneh, Shadi
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY (EIT), 2018, : 52 - +
  • [10] Accelerating K-Means Clustering with Parallel Implementations and GPU computing
    Bhimani, Janki
    Leeser, Miriam
    Mi, Ningfang
    [J]. 2015 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2015,