Boosting Discrimination Information Based Document Clustering Using Consensus and Classification

被引:6
|
作者
Sheri, Ahmad Muqeem [1 ]
Rafique, Muhammad Aasim [2 ]
Hassan, Malik Tahir [3 ]
Junejo, Khurum Nazir [4 ]
Jeon, Moongu [5 ]
机构
[1] Natl Univ Sci & Technol, Mil Coll Signals, Dept Comp Software Engn, Islamabad, Pakistan
[2] Quaid i Azam Univ, Dept Comp Sci, Islamabad, Pakistan
[3] Univ Management & Technol, Sch Syst & Technol, Dept Software Engn, Lahore, Pakistan
[4] Ibex CX, Lahore, Pakistan
[5] GIST, Sch Elect Engn & Comp Sci, Gwangju, South Korea
关键词
Consensus clustering; discrimination information; document clustering; evidence combination; knowledge reuse; mining methods and algorithms; text mining; FEATURE-SELECTION;
D O I
10.1109/ACCESS.2019.2923462
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Adequate choice of term discrimination information measure (DIM) stipulates guaranteed document clustering. Exercise for the right choice is empirical in nature, and characteristics of data in the documents help experts to speculate a viable solution. Thus, a consistent DIM for the clustering is a mere conjecture and demands intelligent selection of the information measure. In this work, we propose an automated consensus building measure based on a text classifier. Two distinct DIMs construct basic partitions of documents and form base clusters. The consensus building measure method uses the clusters information to find concordant documents and constitute a dataset to train the text classifier. The classifier predicts labels for discordant documents from earlier clustering stage and forms new clusters. The experimentation is performed with eight standard data sets to test efficacy of the proposed technique. The improvement observed by applying the proposed consensus clustering demonstrates its superiority over individual results. Relative Risk (RR) and Measurement of Discrimination Information (MDI) are the two discrimination information measures used for obtaining the base clustering solutions in our experiments.
引用
收藏
页码:78954 / 78962
页数:9
相关论文
共 50 条
  • [21] Multiple kernel boosting framework based on information measure for classification
    Qi, Chengming
    Wang, Yuping
    Tian, Wenjie
    Wang, Qun
    CHAOS SOLITONS & FRACTALS, 2016, 89 : 175 - 186
  • [22] Comparing document classification schemes using K-means clustering
    Silic, Artur
    Moens, Marie-Francine
    Zmak, Lovro
    Basic, Bojana Dalbelo
    KNOWLEDGE - BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2008, 5177 : 615 - +
  • [23] Reduced gene subset selection based on discrimination power boosting for molecular classification
    Lin, Hung-Yi
    KNOWLEDGE-BASED SYSTEMS, 2018, 142 : 181 - 191
  • [24] Hierarchical Clustering Model for Pixel-Based Classification of Document Images
    Vieux, Remi
    Domenger, Jean-Philippe
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 290 - 293
  • [25] Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine
    Taha, Altyeb Altaher
    Malebary, Sharaf Jameel
    Neural Computing and Applications, 2021, 33 (12) : 6721 - 6732
  • [26] Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine
    Altyeb Altaher Taha
    Sharaf Jameel Malebary
    Neural Computing and Applications, 2021, 33 : 6721 - 6732
  • [27] Hybrid classification of Android malware based on fuzzy clustering and the gradient boosting machine
    Taha, Altyeb Altaher
    Malebary, Sharaf Jameel
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (12): : 6721 - 6732
  • [28] A SOM-based document clustering using phrases
    Bakus, J
    Hussin, MF
    Kamel, M
    ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 2212 - 2216
  • [29] Identifying Contextual Information in Document Classification using Term Weighting
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    Proceedings of the 8th International Advance Computing Conference, IACC 2018, 2018, : 72 - 78
  • [30] Identifying Contextual Information in Document Classification using Term Weighting
    Deshmukh, Pratiksha R.
    Phalnikar, Rashmi
    PROCEEDINGS OF THE 2018 IEEE 8TH INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC 2018), 2018, : 72 - 78