Boosting Discrimination Information Based Document Clustering Using Consensus and Classification

被引:6
|
作者
Sheri, Ahmad Muqeem [1 ]
Rafique, Muhammad Aasim [2 ]
Hassan, Malik Tahir [3 ]
Junejo, Khurum Nazir [4 ]
Jeon, Moongu [5 ]
机构
[1] Natl Univ Sci & Technol, Mil Coll Signals, Dept Comp Software Engn, Islamabad, Pakistan
[2] Quaid i Azam Univ, Dept Comp Sci, Islamabad, Pakistan
[3] Univ Management & Technol, Sch Syst & Technol, Dept Software Engn, Lahore, Pakistan
[4] Ibex CX, Lahore, Pakistan
[5] GIST, Sch Elect Engn & Comp Sci, Gwangju, South Korea
关键词
Consensus clustering; discrimination information; document clustering; evidence combination; knowledge reuse; mining methods and algorithms; text mining; FEATURE-SELECTION;
D O I
10.1109/ACCESS.2019.2923462
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Adequate choice of term discrimination information measure (DIM) stipulates guaranteed document clustering. Exercise for the right choice is empirical in nature, and characteristics of data in the documents help experts to speculate a viable solution. Thus, a consistent DIM for the clustering is a mere conjecture and demands intelligent selection of the information measure. In this work, we propose an automated consensus building measure based on a text classifier. Two distinct DIMs construct basic partitions of documents and form base clusters. The consensus building measure method uses the clusters information to find concordant documents and constitute a dataset to train the text classifier. The classifier predicts labels for discordant documents from earlier clustering stage and forms new clusters. The experimentation is performed with eight standard data sets to test efficacy of the proposed technique. The improvement observed by applying the proposed consensus clustering demonstrates its superiority over individual results. Relative Risk (RR) and Measurement of Discrimination Information (MDI) are the two discrimination information measures used for obtaining the base clustering solutions in our experiments.
引用
收藏
页码:78954 / 78962
页数:9
相关论文
共 50 条
  • [31] Graph-based Semi-Supervised Classification for Online Customer Reviews Using Consensus Clustering
    Torizuka, Kenjiro
    Saitoh, Fumiaki
    Ishizu, Syohei
    2019 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT (IEEM), 2019, : 1062 - 1066
  • [32] Multiview Boosting With Information Propagation for Classification
    Peng, Jing
    Aved, Alex J.
    Seetharaman, Guna
    Palaniappan, Kannappan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (03) : 657 - 669
  • [33] A clustering method based on boosting
    Frossyniotis, D
    Likas, A
    Stafylopatis, A
    PATTERN RECOGNITION LETTERS, 2004, 25 (06) : 641 - 654
  • [34] Incorporating semantic and syntactic information in document representation for document clustering
    Wang, Yong
    Hodges, Julia
    WMSCI 2005: 9TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL 8, 2005, : 278 - 283
  • [35] Document Classification in Information Retrieval System based on Neutrosophic sets
    El Barbary, O. G.
    FILOMAT, 2020, 34 (01) : 89 - 97
  • [36] DDOC: Overlapping clustering of words for document classification
    Cleuziou, G
    Martin, L
    Clavier, V
    Vrain, C
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2004, 3246 : 127 - 128
  • [37] Weighted Conditional Mutual Information Based Boosting for Classification of Imbalanced Datasets
    Utasi, Akos
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2711 - 2714
  • [38] Incremental document clustering for web page classification
    Wong, WC
    Fu, AWC
    ENABLING SOCIETY WITH INFORMATION TECHNOLOGY, 2002, : 101 - 110
  • [39] Unsupervised clustering for nontextual web document classification
    Chan, SWK
    Chong, MWC
    DECISION SUPPORT SYSTEMS, 2004, 37 (03) : 377 - 396
  • [40] Consensus and complementarity based maximum entropy discrimination for multi-view classification
    Chao, Guoqing
    Sun, Shiliang
    INFORMATION SCIENCES, 2016, 367 : 296 - 310