Boosting Discrimination Information Based Document Clustering Using Consensus and Classification

被引：6

作者：

Sheri, Ahmad Muqeem ^{[1
]}

Rafique, Muhammad Aasim ^{[2
]}

Hassan, Malik Tahir ^{[3
]}

Junejo, Khurum Nazir ^{[4
]}

Jeon, Moongu ^{[5
]}

机构：

[1] Natl Univ Sci & Technol, Mil Coll Signals, Dept Comp Software Engn, Islamabad, Pakistan

[2] Quaid i Azam Univ, Dept Comp Sci, Islamabad, Pakistan

[3] Univ Management & Technol, Sch Syst & Technol, Dept Software Engn, Lahore, Pakistan

[4] Ibex CX, Lahore, Pakistan

[5] GIST, Sch Elect Engn & Comp Sci, Gwangju, South Korea

来源：

IEEE ACCESS | 2019年 / 7卷

关键词：

Consensus clustering; discrimination information; document clustering; evidence combination; knowledge reuse; mining methods and algorithms; text mining; FEATURE-SELECTION;

D O I：

10.1109/ACCESS.2019.2923462

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Adequate choice of term discrimination information measure (DIM) stipulates guaranteed document clustering. Exercise for the right choice is empirical in nature, and characteristics of data in the documents help experts to speculate a viable solution. Thus, a consistent DIM for the clustering is a mere conjecture and demands intelligent selection of the information measure. In this work, we propose an automated consensus building measure based on a text classifier. Two distinct DIMs construct basic partitions of documents and form base clusters. The consensus building measure method uses the clusters information to find concordant documents and constitute a dataset to train the text classifier. The classifier predicts labels for discordant documents from earlier clustering stage and forms new clusters. The experimentation is performed with eight standard data sets to test efficacy of the proposed technique. The improvement observed by applying the proposed consensus clustering demonstrates its superiority over individual results. Relative Risk (RR) and Measurement of Discrimination Information (MDI) are the two discrimination information measures used for obtaining the base clustering solutions in our experiments.

引用

页码：78954 / 78962

页数：9

共 50 条

[1] CDIM: Document Clustering by Discrimination Information Maximization
Hassan, Malik Tahir
Karim, Asim
Kim, Jeong-Bae
Jeon, Moongu
INFORMATION SCIENCES, 2015, 316 : 87 - 106
[2] Classification Boosting by Data Decomposition Using Consensus-Based Combination of Classifiers
Tayanov, Vitaliy
Krzyzak, Adam
Suen, Ching
IMAGE ANALYSIS AND RECOGNITION, ICIAR 2017, 2017, 10317 : 408 - 415
[3] Consensus-based clustering for document image segmentation
Soumyadeep Dey
Jayanta Mukherjee
Shamik Sural
International Journal on Document Analysis and Recognition (IJDAR), 2016, 19 : 351 - 368
[4] Consensus-based clustering for document image segmentation
Dey, Soumyadeep
Mukherjee, Jayanta
Sural, Shamik
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (04) : 351 - 368
[5] Document classification: An approach using feature clustering
Harish, B.S.
Udayasri, B.
Advances in Intelligent Systems and Computing, 2014, 235 : 163 - 173
[6] Using element and document profile for information clustering
Lai, J
Soh, B
2004 IEEE INTERNATIONAL CONFERNECE ON E-TECHNOLOGY, E-COMMERE AND E-SERVICE, PROCEEDINGS, 2004, : 503 - 506
[7] Incomplete Multiview Clustering Based on Consensus Information
Tang, Jiayi
Zhao, Long
Liu, Xinwang
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[8] Information-theoretic term weighting schemes for document clustering and classification
Ke, Weimao
INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2015, 16 (02) : 145 - 159
[9] Boosting feature selection using information metric for classification
Liu, Huawen
Liu, Lei
Zhang, Huijie
NEUROCOMPUTING, 2009, 73 (1-3) : 295 - 303
[10] DOCUMENT CLUSTERING WITH BURSTY INFORMATION
Hoonlor, Apirak
Szymanski, Boleslaw K.
Zaki, Mohammed J.
Chaoji, Vineet
COMPUTING AND INFORMATICS, 2012, 31 (06) : 1533 - 1555

← 1 2 3 4 5 →