Boosting Discrimination Information Based Document Clustering Using Consensus and Classification

被引:6
|
作者
Sheri, Ahmad Muqeem [1 ]
Rafique, Muhammad Aasim [2 ]
Hassan, Malik Tahir [3 ]
Junejo, Khurum Nazir [4 ]
Jeon, Moongu [5 ]
机构
[1] Natl Univ Sci & Technol, Mil Coll Signals, Dept Comp Software Engn, Islamabad, Pakistan
[2] Quaid i Azam Univ, Dept Comp Sci, Islamabad, Pakistan
[3] Univ Management & Technol, Sch Syst & Technol, Dept Software Engn, Lahore, Pakistan
[4] Ibex CX, Lahore, Pakistan
[5] GIST, Sch Elect Engn & Comp Sci, Gwangju, South Korea
关键词
Consensus clustering; discrimination information; document clustering; evidence combination; knowledge reuse; mining methods and algorithms; text mining; FEATURE-SELECTION;
D O I
10.1109/ACCESS.2019.2923462
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Adequate choice of term discrimination information measure (DIM) stipulates guaranteed document clustering. Exercise for the right choice is empirical in nature, and characteristics of data in the documents help experts to speculate a viable solution. Thus, a consistent DIM for the clustering is a mere conjecture and demands intelligent selection of the information measure. In this work, we propose an automated consensus building measure based on a text classifier. Two distinct DIMs construct basic partitions of documents and form base clusters. The consensus building measure method uses the clusters information to find concordant documents and constitute a dataset to train the text classifier. The classifier predicts labels for discordant documents from earlier clustering stage and forms new clusters. The experimentation is performed with eight standard data sets to test efficacy of the proposed technique. The improvement observed by applying the proposed consensus clustering demonstrates its superiority over individual results. Relative Risk (RR) and Measurement of Discrimination Information (MDI) are the two discrimination information measures used for obtaining the base clustering solutions in our experiments.
引用
收藏
页码:78954 / 78962
页数:9
相关论文
共 50 条
  • [1] CDIM: Document Clustering by Discrimination Information Maximization
    Hassan, Malik Tahir
    Karim, Asim
    Kim, Jeong-Bae
    Jeon, Moongu
    INFORMATION SCIENCES, 2015, 316 : 87 - 106
  • [2] Classification Boosting by Data Decomposition Using Consensus-Based Combination of Classifiers
    Tayanov, Vitaliy
    Krzyzak, Adam
    Suen, Ching
    IMAGE ANALYSIS AND RECOGNITION, ICIAR 2017, 2017, 10317 : 408 - 415
  • [3] Consensus-based clustering for document image segmentation
    Soumyadeep Dey
    Jayanta Mukherjee
    Shamik Sural
    International Journal on Document Analysis and Recognition (IJDAR), 2016, 19 : 351 - 368
  • [4] Consensus-based clustering for document image segmentation
    Dey, Soumyadeep
    Mukherjee, Jayanta
    Sural, Shamik
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (04) : 351 - 368
  • [5] Document classification: An approach using feature clustering
    Harish, B.S.
    Udayasri, B.
    Advances in Intelligent Systems and Computing, 2014, 235 : 163 - 173
  • [6] Using element and document profile for information clustering
    Lai, J
    Soh, B
    2004 IEEE INTERNATIONAL CONFERNECE ON E-TECHNOLOGY, E-COMMERE AND E-SERVICE, PROCEEDINGS, 2004, : 503 - 506
  • [7] Incomplete Multiview Clustering Based on Consensus Information
    Tang, Jiayi
    Zhao, Long
    Liu, Xinwang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [8] Information-theoretic term weighting schemes for document clustering and classification
    Ke, Weimao
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2015, 16 (02) : 145 - 159
  • [9] Boosting feature selection using information metric for classification
    Liu, Huawen
    Liu, Lei
    Zhang, Huijie
    NEUROCOMPUTING, 2009, 73 (1-3) : 295 - 303
  • [10] DOCUMENT CLUSTERING WITH BURSTY INFORMATION
    Hoonlor, Apirak
    Szymanski, Boleslaw K.
    Zaki, Mohammed J.
    Chaoji, Vineet
    COMPUTING AND INFORMATICS, 2012, 31 (06) : 1533 - 1555