Enhancing the effectiveness of clustering with spectra analysis

被引:17
|
作者
Li, Wenyuan [1 ]
Ng, Wee-Keong
Liu, Ying
Ong, Kok-Leong
机构
[1] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75083 USA
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[3] Deakin Univ, Sch Engn & Informat Technol, Geelong, Vic 3217, Australia
关键词
clustering; spectral methods; eigenvalues; eigenvectors;
D O I
10.1109/TKDE.2007.1066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For many clustering algorithms, such as K-Means, Em, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, k, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: How can we effectively estimate the number of clusters in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.
引用
收藏
页码:887 / 902
页数:16
相关论文
共 50 条
  • [31] METHOD FOR ENHANCING RESOLUTION OF SPECTRA
    MOHOS, B
    ZOBRIST, M
    ZELEWSKY, AV
    JOURNAL OF CHEMICAL PHYSICS, 1974, 60 (11): : 4633 - 4633
  • [32] Enhancing comprehensibility of software clustering results
    Siddique, F.
    Maqbool, O.
    IET SOFTWARE, 2012, 6 (04) : 283 - 295
  • [33] Data mining techniques on astronomical spectra data - I. Clustering analysis
    Yang, Haifeng
    Shi, Chenhui
    Cai, Jianghui
    Zhou, Lichan
    Yang, Yuqing
    Zhao, Xujun
    He, Yanting
    Hao, Jing
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2022, 517 (04) : 5496 - 5523
  • [34] Enhancing the detection of seizures with a clustering algorithm
    Klatchko, A
    Raviv, G
    Webber, WRS
    Lesser, RP
    ELECTROENCEPHALOGRAPHY AND CLINICAL NEUROPHYSIOLOGY, 1998, 106 (01): : 52 - 63
  • [35] Statistical semantics for enhancing document clustering
    Farahat, Ahmed K.
    Kamel, Mohamed S.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 28 (02) : 365 - 393
  • [36] Enhancing principal direction divisive clustering
    Tasoulis, S. K.
    Tasoulis, D. K.
    Plagianakos, V. P.
    PATTERN RECOGNITION, 2010, 43 (10) : 3391 - 3411
  • [37] Enhancing Network Embedding with Implicit Clustering
    Li, Qi
    Zhong, Jiang
    Li, Qing
    Cao, Zehong
    Wang, Chen
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 : 452 - 467
  • [38] Statistical semantics for enhancing document clustering
    Ahmed K. Farahat
    Mohamed S. Kamel
    Knowledge and Information Systems, 2011, 28 : 365 - 393
  • [39] Spectra measurement and clustering analysis of global horizontal irradiance for solar energy application
    Zhang, Yanyun
    Xue, Peng
    Zhao, Yifan
    Zhang, Qianqian
    Bai, Gongxun
    Peng, Jinqing
    Li, Bojia
    Renewable Energy, 2024, 222
  • [40] Clustering Analysis of Near-Infrared Spectra of Rhubarb after Wavelet Transform
    Yanfeng TANG
    Zhanzhong HOU
    Zhibao WANG
    Guoqiang FAN
    Medicinal Plant, 2013, (01) : 11 - 13