Robust Document Clustering by Exploiting Feature Diversity in Cluster Ensembles

被引:0
|
作者
Sevillano, Xavier [1 ]
Cobo, German [1 ]
Alias, Francesc [1 ]
Claudi Socoro, Joan [1 ]
机构
[1] Univ Ramon Llull, Enginyeria & Arquitectura Salle, Dept Comunicaciones & Teoria Serial, Pg Bonanova,8, Barcelona 08022, Spain
来源
PROCESAMIENTO DEL LENGUAJE NATURAL | 2006年 / 37期
关键词
Document representation; clustering; cluster ensembles;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
The performance of document clustering systems is conditioned by the use of optimal text representations, which are not only difficult to determine beforehand, but also may vary from one clustering problem to another. This work presents an approach based on feature diversity and cluster ensembles as a first step towards building document clustering systems that behave robustly across different clustering problems. Experiments conducted on three binary clustering problems of increasing difficulty show that the proposed method is i) robust to near-optimal model order selection, and ii) able to detect constructive interactions between different document representations, thus being capable of yielding consensus clusterings superior to any of the individual clusterings available.
引用
收藏
页码:169 / 176
页数:8
相关论文
共 50 条
  • [31] Document clustering using synthetic cluster prototypes
    Kalogeratos, Argyris
    Likas, Aristidis
    DATA & KNOWLEDGE ENGINEERING, 2011, 70 (03) : 284 - 306
  • [32] Font clustering and cluster identification in document images
    Öztürk, S
    Sankur, B
    Abak, AT
    JOURNAL OF ELECTRONIC IMAGING, 2001, 10 (02) : 418 - 430
  • [33] A Robust Document Localization Solution with Segmentation and Clustering
    Hoang Dang Nguyen
    Dinh Nguyen Vu
    Viet Anh Nguyen
    Tien Dong Nguyen
    Phi Le Nguyen
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE. THEORY AND APPLICATIONS, IEA/AIE 2023, PT I, 2023, 13925 : 167 - 179
  • [34] A hierarchical consensus architecture for robust document clustering
    Sevillano, Xavier
    Cobo, German
    Alias, Francese
    Socoro, Joan Claudi
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 741 - +
  • [35] Exploiting noun phrases and semantic relationships for text document clustering
    Zheng, Hai-Tao
    Kang, Bo-Yeong
    Kim, Hong-Gee
    INFORMATION SCIENCES, 2009, 179 (13) : 2249 - 2262
  • [36] EXPLOITING DIVERSITY OF NEURAL NETWORK ENSEMBLES BASED ON EXTREME LEARNING MACHINE
    Garcia-Laencina, Pedro J.
    Roca-Gonzalez, Jose-Luis
    Bueno-Crespo, Andres
    Sancho-Gomez, Jose-Luis
    NEURAL NETWORK WORLD, 2013, 23 (05) : 395 - 409
  • [37] Exploiting Word Cluster Information for Unsupervised Feature Selection
    Wu, Qingyao
    Ye, Yunming
    Ng, Michael
    Su, Hanjing
    Huang, Joshua
    PRICAI 2010: TRENDS IN ARTIFICIAL INTELLIGENCE, 2010, 6230 : 292 - +
  • [38] Exploiting Document Structures and Cluster Consistencies for Event Coreference Resolution
    Hieu Minh Tran
    Duy Phung
    Thien Huu Nguyen
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4840 - 4850
  • [39] Feature partitioning for robust tree ensembles and their certification in adversarial scenarios
    Calzavara, Stefano
    Lucchese, Claudio
    Marcuzzi, Federico
    Orlando, Salvatore
    EURASIP JOURNAL ON INFORMATION SECURITY, 2021, 2021 (01)
  • [40] An Improved XML Document Clustering Using Path Feature
    Yuan, Jin-sha
    Li, Xin-ye
    Ma, Li-na
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 400 - +