Robust Document Clustering by Exploiting Feature Diversity in Cluster Ensembles

被引:0
|
作者
Sevillano, Xavier [1 ]
Cobo, German [1 ]
Alias, Francesc [1 ]
Claudi Socoro, Joan [1 ]
机构
[1] Univ Ramon Llull, Enginyeria & Arquitectura Salle, Dept Comunicaciones & Teoria Serial, Pg Bonanova,8, Barcelona 08022, Spain
来源
关键词
Document representation; clustering; cluster ensembles;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The performance of document clustering systems is conditioned by the use of optimal text representations, which are not only difficult to determine beforehand, but also may vary from one clustering problem to another. This work presents an approach based on feature diversity and cluster ensembles as a first step towards building document clustering systems that behave robustly across different clustering problems. Experiments conducted on three binary clustering problems of increasing difficulty show that the proposed method is i) robust to near-optimal model order selection, and ii) able to detect constructive interactions between different document representations, thus being capable of yielding consensus clusterings superior to any of the individual clusterings available.
引用
收藏
页码:169 / 176
页数:8
相关论文
共 50 条
  • [21] Using Ontology and Cluster Ensembles for Geospatial Clustering Analysis
    Wang, Xin
    Gu, Wei
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 400 - 410
  • [22] Text Document Clustering: The Application of Cluster Analysis to Textual Document
    2016, Institute of Electrical and Electronics Engineers Inc., United States
  • [23] Text Document Clustering: The Application of Cluster Analysis to Textual Document
    Reddy, Venkata Srikanth
    Kinnicutt, Patrick
    Lee, Roger
    2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 1174 - 1179
  • [24] A Visual Approach to Improve Clustering Based on Cluster Ensembles
    Zhou, Jianping
    Konecni, Shawn
    Marx, Kenneth
    Grinstein, Georges
    VISUALIZATION AND DATA ANALYSIS 2010, 2010, 7530
  • [25] Use of Ontology and Cluster Ensembles for Geospatial Clustering Analysis
    Gu, Wei
    Zhang, Zhilin
    Wang, Baijie
    Wang, Xin
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2014, 2014, 8436 : 119 - 130
  • [26] Semantic Feature Reduction in Chinese Document Clustering
    Meng, Xianjun
    Chen, Qingcai
    Wang, Xiaolong
    2008 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), VOLS 1-6, 2008, : 3720 - 3725
  • [27] A Feature Selection for Korean Web Document Clustering
    Park, Heum
    Kim, Young-Gi
    Kwon, Hyuk-Chul
    IECON 2004: 30TH ANNUAL CONFERENCE OF IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOL 3, 2004, : 2650 - 2654
  • [28] Document classification: An approach using feature clustering
    Harish, B.S.
    Udayasri, B.
    Advances in Intelligent Systems and Computing, 2014, 235 : 163 - 173
  • [29] LDA Based Feature Selection for Document Clustering
    Kumar, B. Shravan
    Ravi, Vadlamani
    COMPUTE'17: PROCEEDINGS OF THE 10TH ANNUAL ACM INDIA COMPUTE CONFERENCE, 2017, : 125 - 130
  • [30] Exploiting ensemble diversity for automatic feature extraction
    Brown, G
    Yao, X
    Wyatt, J
    Wersing, H
    Sendhoff, B
    ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 1786 - 1790