Boosting Discrimination Information Based Document Clustering Using Consensus and Classification

被引:6
|
作者
Sheri, Ahmad Muqeem [1 ]
Rafique, Muhammad Aasim [2 ]
Hassan, Malik Tahir [3 ]
Junejo, Khurum Nazir [4 ]
Jeon, Moongu [5 ]
机构
[1] Natl Univ Sci & Technol, Mil Coll Signals, Dept Comp Software Engn, Islamabad, Pakistan
[2] Quaid i Azam Univ, Dept Comp Sci, Islamabad, Pakistan
[3] Univ Management & Technol, Sch Syst & Technol, Dept Software Engn, Lahore, Pakistan
[4] Ibex CX, Lahore, Pakistan
[5] GIST, Sch Elect Engn & Comp Sci, Gwangju, South Korea
关键词
Consensus clustering; discrimination information; document clustering; evidence combination; knowledge reuse; mining methods and algorithms; text mining; FEATURE-SELECTION;
D O I
10.1109/ACCESS.2019.2923462
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Adequate choice of term discrimination information measure (DIM) stipulates guaranteed document clustering. Exercise for the right choice is empirical in nature, and characteristics of data in the documents help experts to speculate a viable solution. Thus, a consistent DIM for the clustering is a mere conjecture and demands intelligent selection of the information measure. In this work, we propose an automated consensus building measure based on a text classifier. Two distinct DIMs construct basic partitions of documents and form base clusters. The consensus building measure method uses the clusters information to find concordant documents and constitute a dataset to train the text classifier. The classifier predicts labels for discordant documents from earlier clustering stage and forms new clusters. The experimentation is performed with eight standard data sets to test efficacy of the proposed technique. The improvement observed by applying the proposed consensus clustering demonstrates its superiority over individual results. Relative Risk (RR) and Measurement of Discrimination Information (MDI) are the two discrimination information measures used for obtaining the base clustering solutions in our experiments.
引用
收藏
页码:78954 / 78962
页数:9
相关论文
共 50 条
  • [41] Document representation based on probabilistic word clustering in customer-voice classification
    Younghoon Lee
    Seokmin Song
    Sungzoon Cho
    Jinhae Choi
    Pattern Analysis and Applications, 2019, 22 : 221 - 232
  • [42] Document representation based on probabilistic word clustering in customer-voice classification
    Lee, Younghoon
    Song, Seokmin
    Cho, Sungzoon
    Choi, Jinhae
    PATTERN ANALYSIS AND APPLICATIONS, 2019, 22 (01) : 221 - 232
  • [43] Probability based document clustering and image clustering using content-based image retrieval
    Karthikeyan, M.
    Aruna, P.
    APPLIED SOFT COMPUTING, 2013, 13 (02) : 959 - 966
  • [44] Document Clustering Using Gravitational Ensemble Clustering
    Sadeghian, Armindokht Hashempour
    Nezamabadi-pour, Hossein
    2015 INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2015, : 240 - 245
  • [45] Smooth boosting using an information-based criterion
    Hatano, Kohei
    ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2006, 4264 : 304 - 318
  • [46] Classification and clustering of information objects based on fuzzy neighborhood system
    Miyamoto, S
    Endo, Y
    Hayakawa, S
    Kataoka, E
    INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOL 1-4, PROCEEDINGS, 2005, : 3210 - 3215
  • [47] Supervised clustering algorithm based visual information features classification
    Yuan, Y
    Yu, NH
    Li, XL
    Tao, DC
    Liu, ZK
    SECOND INTERNATION CONFERENCE ON IMAGE AND GRAPHICS, PTS 1 AND 2, 2002, 4875 : 614 - 618
  • [48] A Roadmap to Integrate Document Clustering in Information Retrieval
    Subhashini, R.
    Kumar, V. Jawahar Senthil
    INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2011, 1 (01) : 31 - 44
  • [49] Using clustering for document reconstruction
    Ukovich, Anna
    Zacchigna, Alessandra
    Ramponi, Giovanni
    Schoier, Gabriella
    IMAGE PROCESSING: ALGORITHMS AND SYSTEMS, NEURAL NETWORKS, AND MACHINE LEARNING, 2006, 6064
  • [50] Incorporating temporal information for document classification
    Luo, Xiao
    Zincir-Heywood, Nur
    2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2, 2007, : 780 - +