Towards effective document clustering:: A constrained K-means based approach

被引:30
|
作者
Hu, Guobiao [1 ,2 ]
Zhou, Shuigeng [1 ,2 ]
Guan, Jihong [3 ]
Hu, Xiaohua [4 ]
机构
[1] Fudan Univ, Dept Comp Sci & Engn, Shanghai 200433, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[3] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200092, Peoples R China
[4] Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USA
基金
中国国家自然科学基金;
关键词
document clustering; semi-supervised learning; spectral relaxation; clustering with prior knowledge;
D O I
10.1016/j.ipm.2008.03.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then,the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1397 / 1409
页数:13
相关论文
共 50 条
  • [31] Graph based k-means clustering
    Galluccio, Laurent
    Michel, Olivier
    Comon, Pierre
    Hero, Alfred O., III
    [J]. SIGNAL PROCESSING, 2012, 92 (09) : 1970 - 1984
  • [32] Global k-means plus plus : an effective relaxation of the global k-means clustering algorithm
    Vardakas, Georgios
    Likas, Aristidis
    [J]. APPLIED INTELLIGENCE, 2024, 54 (19) : 8876 - 8888
  • [33] A k-means approach to clustering disease progressions
    Duc Thanh Anh Luong
    Chandola, Varun
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2017, : 268 - 274
  • [34] An Improved Hierarchical K-Means Algorithm for Web Document Clustering
    Liu, Yongxin
    Liu, Zhijng
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, 2008, : 606 - 610
  • [35] Hierarchical initialization approach for K-Means clustering
    Lu, J. F.
    Tang, J. B.
    Tang, Z. M.
    Yang, J. Y.
    [J]. PATTERN RECOGNITION LETTERS, 2008, 29 (06) : 787 - 795
  • [36] Centroid Update Approach to K-Means Clustering
    Borlea, Ioan-Daniel
    Precup, Radu-Emil
    Dragan, Florin
    Borlea, Alexandra-Bianca
    [J]. ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2017, 17 (04) : 3 - 10
  • [37] Quantum clustering with k-Means: A hybrid approach
    Poggiali, Alessandro
    Berti, Alessandro
    Bernasconi, Anna
    Del Corso, Gianna M.
    Guidotti, Riccardo
    [J]. THEORETICAL COMPUTER SCIENCE, 2024, 992
  • [38] Comparing document classification schemes using K-means clustering
    Silic, Artur
    Moens, Marie-Francine
    Zmak, Lovro
    Basic, Bojana Dalbelo
    [J]. KNOWLEDGE - BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2008, 5177 : 615 - +
  • [39] An effective and efficient hierarchical K-means clustering algorithm
    Qi, Jianpeng
    Yu, Yanwei
    Wang, Lihong
    Liu, Jinglei
    Wang, Yingjie
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2017, 13 (08) : 1 - 17
  • [40] Improved Constrained k-Means Algorithm for Clustering with Domain Knowledge
    Huang, Peihuang
    Yao, Pei
    Hao, Zhendong
    Peng, Huihong
    Guo, Longkun
    [J]. MATHEMATICS, 2021, 9 (19)