A document clustering spectral algorithm that uses evidence accumulation

被引:0
|
作者
Xu S. [1 ,2 ]
Lu Z.-M. [1 ]
Zhang C.-X. [3 ]
Gu G.-C. [1 ]
Zhang Q. [1 ]
机构
[1] School of Information Engineering, Yancheng Institute of Technology
[2] College of Computer Science and Technology, Harbin Engineering University
[3] School of Computer Science and Technology, Harbin University of Science and Technology
关键词
Clustering analysis; Document clustering; Evidence accumulation; Spectral clustering; Spherical K-means;
D O I
10.3969/j.issn.1006-7043.2010.08.010
中图分类号
学科分类号
摘要
Spectral clustering's weakness is an inability to choose a similarity measure. To resolve this, a document clustering spectral algorithm using evidence accumulation was proposed. In this algorithm, spherical K-means was first performed over document sets multiple times. Each time the partitioning results were regarded as evidence when judging whether two documents should be put in the same cluster or not. On this basis, the similarity matrix and normalized Laplacian matrix of the documents were constructed. Experiments on the Text REtrieval Conference (TREC) and Reuters document sets demonstrated the effectiveness of the proposed algorithm. It outperformed hierarchical clustering algorithms as well as the K-means algorithm provided in the CLUTO general purpose clustering toolkit.
引用
收藏
页码:1043 / 1047
页数:4
相关论文
共 15 条
  • [1] Tan P.N., Steinbach M., Kumar V., Introduction to Data Mining, pp. 487-647, (2005)
  • [2] Xu S., Lu Z., Gu G., Two spectral algorithms for ensembling document clusters, Acta Automatica Sinica, 35, 7, pp. 997-1002, (2009)
  • [3] Luxburg U.V., A tutorial on spectral clustering, Statistics and Computing, 17, 4, pp. 395-416, (2007)
  • [4] Hagen L., Kahng A.B., New spectral methods for ratio cut partitioning and clustering, IEEE Transactions on Computer-Aided Design, 11, 9, pp. 1074-1085, (1992)
  • [5] Shi J., Malik J., Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 8, pp. 888-905, (2000)
  • [6] Ng A.Y., Jordan M.I., Weiss Y., On spectral clustering: analysis and an algorithm, Advances in Neural Information Processing Systems, (2001)
  • [7] Meila M., Shi J., A random walks view of spectral segmentation, The 8th International Workshop on Artificial Intelligence and Statistics, (2001)
  • [8] Wang L., Bo L., Jiao L., Density-sensitive spectral clustering, Acta Electronica Sinica, 35, 8, pp. 1577-1581, (2007)
  • [9] Fred A., Jain A.K., Data clustering using evidence accumulation, Proceedings of the 16th International Conference on Pattern Recognition, pp. 276-280, (2002)
  • [10] Fred A.L., Jain A.K., Combining multiple clusterings using evidence accumulation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 6, pp. 835-850, (2005)