Robust Document Clustering by Exploiting Feature Diversity in Cluster Ensembles

被引:0
|
作者
Sevillano, Xavier [1 ]
Cobo, German [1 ]
Alias, Francesc [1 ]
Claudi Socoro, Joan [1 ]
机构
[1] Univ Ramon Llull, Enginyeria & Arquitectura Salle, Dept Comunicaciones & Teoria Serial, Pg Bonanova,8, Barcelona 08022, Spain
来源
关键词
Document representation; clustering; cluster ensembles;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The performance of document clustering systems is conditioned by the use of optimal text representations, which are not only difficult to determine beforehand, but also may vary from one clustering problem to another. This work presents an approach based on feature diversity and cluster ensembles as a first step towards building document clustering systems that behave robustly across different clustering problems. Experiments conducted on three binary clustering problems of increasing difficulty show that the proposed method is i) robust to near-optimal model order selection, and ii) able to detect constructive interactions between different document representations, thus being capable of yielding consensus clusterings superior to any of the individual clusterings available.
引用
收藏
页码:169 / 176
页数:8
相关论文
共 50 条
  • [1] Exploiting Statistical and Semantic Information for Document Clustering : an Evaluation on Feature Selection
    Benghabrit, Asmaa
    Ouhbi, Brahim
    Zemmouri, El Moukhtar
    Frikh, Bouchra
    Behja, Hicham
    2014 THIRD IEEE INTERNATIONAL COLLOQUIUM IN INFORMATION SCIENCE AND TECHNOLOGY (CIST'14), 2014, : 96 - 101
  • [2] Exploiting Document Level Semantics in Document Clustering
    Rafi, Muhammad
    Sharif, Muhammad Naveed
    Arshad, Waleed
    Rafay, Habibullah
    Mohsin, Sheharyar
    Shaikh, Mohammad Shahid
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (06) : 462 - 469
  • [3] Using diversity in cluster ensembles
    Kuncheva, LI
    Hadjitodorov, ST
    2004 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN & CYBERNETICS, VOLS 1-7, 2004, : 1214 - 1219
  • [4] Using cluster validation criterion to identify optimal feature subset and cluster number for document clustering
    Niu, Zheng-Yu
    Ji, Dong-Hong
    Tan, Chew Lim
    INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (03) : 730 - 739
  • [5] Semantic Feature Graph Consistency with Contrastive Cluster Assignments for Multilingual Document Clustering
    Sun, Teng
    Shu, Zhenqiu
    Huang, Yuxin
    Wang, Hongbin
    Yu, Zhengtao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2025, 24 (01)
  • [6] Exploiting diversity of neural ensembles with speciated evolution
    Lee, SI
    Ahn, JH
    Cho, SB
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 808 - 813
  • [7] Feature selection and document clustering
    Dhillon, I
    Kogan, J
    Nicholas, C
    SURVEY OF TEXT MINING: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2004, : 73 - 100
  • [8] Analysis of diversity measures in clustering ensembles
    Luo, Hui-Lan
    Kong, Fan-Sheng
    Li, Yi-Xiao
    Jisuanji Xuebao/Chinese Journal of Computers, 2007, 30 (08): : 1315 - 1324
  • [9] Moderate diversity for better cluster ensembles
    Hadjitodorov, Stefan T.
    Kuncheva, Ludmila I.
    Todorova, Ludmila P.
    INFORMATION FUSION, 2006, 7 (03) : 264 - 275
  • [10] Speaker Diarization Exploiting the Eigengap Criterion and Cluster Ensembles
    Bassiou, Nikoletta
    Moschou, Vassiliki
    Kotropoulos, Constantine
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2010, 18 (08): : 2134 - 2144