Efficient online spherical K-means clustering

被引:0
|
作者
Zhong, S [1 ]
机构
[1] Florida Atlantic Univ, Dept Comp Sci & Engn, Boca Raton, FL 33431 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The spherical k-means algorithm, i.e., the k-means algorithm with cosine similarity, is a popular method for clustering high-dimensional text data. In this algorithm, each document as well as each cluster mean is represented as a high-dimensional unit-length vector. However, it has been mainly used in batch mode. That is, each cluster mean vector is updated, only after all document vectors being assigned, as the (normalized) average of all the document vectors assigned to that cluster. This paper investigates an online version of the spherical k-means algorithm based on the well-known Winner-Take-All competitive learning. In this online algorithm, each cluster centroid is incrementally updated given a document. We demonstrate that the online spherical k-means algorithm can achieve significantly better clustering results than the batch version, especially when an annealing-type learning rate schedule is used. We also present heuristics to improve the speed, yet almost without loss of clustering quality.
引用
收藏
页码:3180 / 3185
页数:6
相关论文
共 50 条
  • [21] An Efficient K-means Clustering Algorithm on MapReduce
    Li, Qiuhong
    Wang, Peng
    Wang, Wei
    Hu, Hao
    Li, Zhongsheng
    Li, Junxian
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, PT I, 2014, 8421 : 357 - 371
  • [22] Online k-means Clustering on Arbitrary Data Streams
    Bhattacharjee, Robi
    Imola, Jacob John
    Moshkovitz, Michal
    Dasgupta, Sanjoy
    [J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 204 - 236
  • [23] A novel approach for initializing the spherical K-means clustering algorithm
    Duwairi, Rehab
    Abu-Rahmeh, Mohammed
    [J]. SIMULATION MODELLING PRACTICE AND THEORY, 2015, 54 : 49 - 63
  • [24] An efficient K-means clustering algorithm for tall data
    Capo, Marco
    Perez, Aritz
    Lozano, Jose A.
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (03) : 776 - 811
  • [25] An efficient K-means clustering algorithm for tall data
    Marco Capó
    Aritz Pérez
    Jose A. Lozano
    [J]. Data Mining and Knowledge Discovery, 2020, 34 : 776 - 811
  • [26] MARIGOLD: Efficient k-means Clustering in High Dimensions
    Mortensen, Kasper Overgaard
    Zardbani, Fatemeh
    Haque, Mohammad Ahsanul
    Agustsson, Steinn Ymir
    Mottin, Davide
    Hofmann, Philip
    Karras, Panagiotis
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 16 (07): : 1740 - 1748
  • [27] An efficient k-means clustering algorithm:: Analysis and implementation
    Kanungo, T
    Mount, DM
    Netanyahu, NS
    Piatko, CD
    Silverman, R
    Wu, AY
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (07) : 881 - 892
  • [28] An effective and efficient hierarchical K-means clustering algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    Wang, Yingjie
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (08) : 1 - 17
  • [29] Efficient image segmentation and implementation of K-means clustering
    Deeparani, K.
    Sudhakar, P.
    [J]. MATERIALS TODAY-PROCEEDINGS, 2021, 45 : 8076 - 8079
  • [30] An efficient approximation to the K-means clustering for massive data
    Capo, Marco
    Perez, Aritz
    Lozano, Jose A.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2017, 117 : 56 - 69