Enhancing the effectiveness of clustering with spectra analysis

被引:17
|
作者
Li, Wenyuan [1 ]
Ng, Wee-Keong
Liu, Ying
Ong, Kok-Leong
机构
[1] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75083 USA
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[3] Deakin Univ, Sch Engn & Informat Technol, Geelong, Vic 3217, Australia
关键词
clustering; spectral methods; eigenvalues; eigenvectors;
D O I
10.1109/TKDE.2007.1066
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For many clustering algorithms, such as K-Means, Em, and CLOPE, there is usually a requirement to set some parameters. Often, these parameters directly or indirectly control the number of clusters, that is, k, to return. In the presence of different data characteristics and analysis contexts, it is often difficult for the user to estimate the number of clusters in the data set. This is especially true in text collections such as Web documents, images, or biological data. In an effort to improve the effectiveness of clustering, we seek the answer to a fundamental question: How can we effectively estimate the number of clusters in a given data set? We propose an efficient method based on spectra analysis of eigenvalues (not eigenvectors) of the data set as the solution to the above. We first present the relationship between a data set and its underlying spectra with theoretical and experimental results. We then show how our method is capable of suggesting a range of k that is well suited to different analysis contexts. Finally, we conclude with further empirical results to show how the answer to this fundamental question enhances the clustering process for large text collections.
引用
收藏
页码:887 / 902
页数:16
相关论文
共 50 条
  • [1] Enhancing Clustering Stability in VANET: A Spectra Clustering Based Approach
    Liu, Gang
    Qi, Nan
    Chen, Jiaxin
    Dong, Chao
    Huang, Zanqi
    CHINA COMMUNICATIONS, 2020, 17 (04) : 140 - 151
  • [2] Enhancing Fault Detection with Clustering and Covariance Analysis
    Gallup, Ethan
    Quah, Titus
    Machalek, Derek
    Powell, Kody M.
    IFAC PAPERSONLINE, 2022, 55 (02): : 258 - 263
  • [3] Enhancing the Effectiveness of Interactive Case-Based Reasoning with Clustering and Decision Forests
    Qiang Yang
    Jing Wu
    Applied Intelligence, 2001, 14 : 49 - 64
  • [4] Enhancing the effectiveness of interactive case-based reasoning with clustering and decision forests
    Yang, Q
    Wu, J
    APPLIED INTELLIGENCE, 2001, 14 (01) : 49 - 64
  • [5] Effectiveness Analysis of The Application of Clustering in Student Grouping
    Xu, Chen
    Zheng, Li
    PROCEEDINGS OF THE 2013 THE INTERNATIONAL CONFERENCE ON EDUCATION TECHNOLOGY AND INFORMATION SYSTEM (ICETIS 2013), 2013, 65 : 988 - 991
  • [6] Rainfall contrast enhancing clustering processes and flood analysis
    Leviandier, T
    Lavabre, J
    Arnaud, P
    JOURNAL OF HYDROLOGY, 2000, 240 (1-2) : 62 - 79
  • [7] Clustering analysis of line indices for LAMOST spectra with AstroStat
    Shu-Xin Chen
    Wei-Min Sun
    Qi Yan
    ResearchinAstronomyandAstrophysics, 2018, 18 (06) : 123 - 130
  • [8] CLUSTERING OF VOLUME REVERBERATION SPECTRA - APPLICATION OF CORRESPONDENCE ANALYSIS
    MCELROY, PT
    SMITH, W
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 (06): : 1243 - 1256
  • [9] Clustering analysis of line indices for LAMOST spectra with AstroStat
    Chen, Shu-Xin
    Sun, Wei-Min
    Yan, Qi
    RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2018, 18 (06)
  • [10] Enhancing gene expression clustering analysis using tangent transformation
    Xin Xu
    International Journal of Machine Learning and Cybernetics, 2013, 4 : 31 - 40