Towards effective document clustering:: A constrained K-means based approach

被引:30
|
作者
Hu, Guobiao [1 ,2 ]
Zhou, Shuigeng [1 ,2 ]
Guan, Jihong [3 ]
Hu, Xiaohua [4 ]
机构
[1] Fudan Univ, Dept Comp Sci & Engn, Shanghai 200433, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[3] Tongji Univ, Dept Comp Sci & Technol, Shanghai 200092, Peoples R China
[4] Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USA
基金
中国国家自然科学基金;
关键词
document clustering; semi-supervised learning; spectral relaxation; clustering with prior knowledge;
D O I
10.1016/j.ipm.2008.03.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for document collection organization and browsing. In real applications, some limited knowledge about cluster membership of a small number of documents is often available, such as some pairs of documents belonging to the same cluster. This kind of prior knowledge can be served as constraints for the clustering process. We integrate the constraints into the trace formulation of the sum of square Euclidean distance function of K-means. Then,the combined criterion function is transformed into trace maximization, which is further optimized by eigen-decomposition. Our experimental evaluation shows that the proposed semi-supervised clustering method can achieve better performance, compared to three existing methods. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1397 / 1409
页数:13
相关论文
共 50 条
  • [1] A Binary Optimization Approach for Constrained K-Means Clustering
    Le, Huu M.
    Eriksson, Anders
    Thanh-Toan Do
    Milford, Michael
    [J]. COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 383 - 398
  • [2] K-means based method for overlapping document clustering
    Beltran, Beatriz
    Vilarino, Darnes
    Martinez-Trinidad, Jose Fco.
    Carrasco-Ochoa, J. A.
    Pinto, David
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2127 - 2135
  • [3] Text Document Clustering Based on Density K-means
    Wu, Di
    Zeng, Yan
    Qu, Yin-chuan
    [J]. INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,
  • [4] An ellipsoidal K-means for document clustering
    Dzogang, Fabon
    Marsala, Christophe
    Lesot, Marie-Jeanne
    Rifqi, Maria
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 221 - 230
  • [5] An Approach for Document Clustering using PSO and K-means Algorithm
    Chouhan, Rashmi
    Purohit, Anuradha
    [J]. PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2018), 2018, : 1380 - 1384
  • [6] An Effective K-means Clustering Based SVM Algorithm
    Yao, YuKai
    Liu, Yang
    Li, Zhao
    Chen, XiaoYun
    [J]. MEASUREMENT TECHNOLOGY AND ENGINEERING RESEARCHES IN INDUSTRY, PTS 1-3, 2013, 333-335 : 1344 - 1348
  • [7] An Improved K-means Algorithm for Document Clustering
    Wu, Guohua
    Lin, Hairong
    Fu, Ershuai
    Wang, Liuyang
    [J]. 2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), 2015, : 65 - 69
  • [8] Harmony K-means algorithm for document clustering
    Mahdavi, Mehrdad
    Abolhassani, Hassan
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2009, 18 (03) : 370 - 391
  • [9] Harmony K-means algorithm for document clustering
    Mehrdad Mahdavi
    Hassan Abolhassani
    [J]. Data Mining and Knowledge Discovery, 2009, 18 : 370 - 391
  • [10] Constrained Clustering with Minkowski Weighted K-Means
    de Amorim, Renato Cordeiro
    [J]. 13TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND INFORMATICS (CINTI 2012), 2012, : 13 - 17