A Clustering Algorithm for Automatically Determining the Number of Clusters Based on Coefficient of Variation

被引:4
|
作者
Liu, Tengteng [1 ]
Qu, Shouning [2 ]
Zhang, Kun [1 ]
机构
[1] Univ Jinan, Sch Informat Sci & Engn, Jinan, Shandong, Peoples R China
[2] Univ Jinan, Sch Informat Sci & Engn, Shandong Prov Key Lab Network Based Intelligent, Jinan 250022, Shandong, Peoples R China
来源
PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018) | 2018年
关键词
Clustering; K-means plus; Density index; Coefficient of variation;
D O I
10.1145/3291801.3291825
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The k-means algorithm is a typical clustering algorithm based on partition. The k-means++ algorithm is a high-quality clustering algorithm, and it is used to solve the problem that the traditional k-means algorithm is sensitive to initial centers. However, the original k-means++ algorithm is sensitive to outliers and needs to manually set the number of clusters. We propose an improved k-means++ clustering algorithm that automatically determine the number of clusters based on coefficient of variation, named CV-means++. Firstly, we propose a method to confirm initial centers by using density index of data points to avoid selection of abnormal data. Secondly, we introduce the concept of coefficient of variation, and calculate the relationship between the average intra-cluster coefficient of variation and the smallest inter-cluster coefficient of variation of k(+) (k+ >> k) clusters to determine whether the number of clusters is optimal. Experiments performed.
引用
收藏
页码:100 / 106
页数:7
相关论文
共 50 条
  • [1] Fuzzy clustering algorithm for automatically determining the number of clusters
    Hu Yangyang
    Liu Zengli
    CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,
  • [2] A Spectral Clustering Algorithm for Automatically Determining Clusters Number
    Chen, Bin
    Wang, Ya-lin
    Gong, Fan-ying
    Wang, Xiao-li
    Yang, Chun-hua
    2014 11TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2014, : 3723 - 3728
  • [3] A density-peak-based clustering algorithm of automatically determining the number of clusters
    Tong, Wuning
    Liu, Sen
    Gao, Xiao-Zhi
    NEUROCOMPUTING, 2021, 458 : 655 - 666
  • [4] Fuzzy C-means clustering algorithm for automatically determining the number of clusters
    Wang, Zhihe
    Wang, Shuyan
    Du, Hui
    Guo, Hao
    2020 16TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2020), 2020, : 223 - 227
  • [5] An improved RPCL algorithm for determining clustering number automatically
    Yang, Jun
    Jin, Lianwen
    TENCON 2006 - 2006 IEEE REGION 10 CONFERENCE, VOLS 1-4, 2006, : 417 - +
  • [6] Sampling and clustering algorithm for determining the number of clusters based on the rosette pattern
    Sadr, Ali
    Momtaz, Amirkeyvan
    OPTICAL ENGINEERING, 2012, 51 (01)
  • [7] SMART: a subspace clustering algorithm that automatically identifies the appropriate number of clusters
    Jing, Liping
    Li, Junjie
    Ng, Michael K.
    Cheung, Yiu-ming
    Huang, Joshua
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2009, 1 (02) : 149 - 177
  • [8] The FRCK clustering algorithm for determining cluster number and removing outliers automatically
    Guo, Yubin
    Wu, Yuhang
    Zhang, Xiaopeng
    Bo, Aofeng
    Li, Ximing
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2021, 24 (05) : 485 - 494
  • [9] Fuzzy C-Means Algorithm Automatically Determining Optimal Number of Clusters
    Xing, Ruikang
    Li, Chenghai
    CMC-COMPUTERS MATERIALS & CONTINUA, 2019, 60 (02): : 767 - 780
  • [10] A Method for Automatically Determining The Number of Clusters of LAC
    Liu, Han
    Wu, Qingfeng
    Dong, Huailin
    Wang, Shuangshuang
    Cai, Qing
    Ma, Zhuo
    ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 1907 - +