A three-stage unsupervised dimension reduction method for text clustering

被引:32
|
作者
Bharti, Kusum Kumari [1 ]
Singh, P. K. [1 ]
机构
[1] ABV Indian Inst Informat Technol & Management Gwa, Computat Intelligence & Data Min Res Lab, Gwalior, MP, India
关键词
Feature selection; Feature extraction; Dimension reduction; Sparsity; Three-stage model; Text clustering; FEATURE-SELECTION; MUTUAL INFORMATION; ALGORITHM;
D O I
10.1016/j.jocs.2013.11.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Dimension reduction is a well-known pre-processing step in the text clustering to remove irrelevant, redundant and noisy features without sacrificing performance of the underlying algorithm. Dimension reduction methods are primarily classified as feature selection (FS) methods and feature extraction (FE) methods. Though FS methods are robust against irrelevant features, they occasionally fail to retain important information present in the original feature space. On the other hand, though FE methods reduce dimensions in the feature space without losing much information, they are significantly affected by the irrelevant features. The one-stage models, FS/FE methods, and the two-stage models, a combination of FS and FE methods proposed in the literature are not sufficient to fulfil all the above mentioned requirements of the dimension reduction. Therefore, we propose three-stage dimension reduction models to remove irrelevant, redundant and noisy features in the original feature space without loss of much valuable information. These models incorporates advantages of the FS and the FE methods to create a low dimension feature subspace. The experiments over three well-known benchmark text datasets of different characteristics show that the proposed three-stage models significantly improve performance of the clustering algorithm as measured by micro F-score, macro F-score, and total execution time. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:156 / 169
页数:14
相关论文
共 50 条
  • [1] A Two-Stage Unsupervised Dimension Reduction Method for Text Clustering
    Bharti, Kusum Kumari
    Singh, Pramod Kumar
    [J]. PROCEEDINGS OF SEVENTH INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTING: THEORIES AND APPLICATIONS (BIC-TA 2012), VOL 2, 2013, 202 : 529 - 542
  • [2] A Three-Stage Matting Method
    Chen, Xiao
    He, Fazhi
    Yu, Haiping
    [J]. IEEE ACCESS, 2017, 5 : 27732 - 27739
  • [3] Document representation and dimension reduction for text clustering
    Shafiei, Mahdi
    Wang, Singer
    Zhang, Roger
    Milios, Evangelos
    Tang, Bin
    Tougas, Jane
    Spiteri, Ray
    [J]. 2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOP, VOLS 1-2, 2007, : 770 - 779
  • [4] Three-Stage Method of Text Region Extraction from Diagram Raster Images
    Sas, Jerzy
    Zolnierek, Andrzej
    [J]. PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2013, 2013, 226 : 527 - 538
  • [5] Text Document Preprocessing and Dimension Reduction Techniques for Text Document Clustering
    Kadhim, Ammar Ismael
    Cheah, Yu-N
    Ahamed, Nurul Hashimah
    [J]. PROCEEDINGS 2014 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE WITH APPLICATIONS IN ENGINEERING AND TECHNOLOGY ICAIET 2014, 2014, : 69 - 73
  • [6] An effective dimension reduction algorithm for clustering Arabic text
    Mohamed, A. A.
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2020, 21 (01) : 1 - 5
  • [7] A Three-Stage method for Data Text Mining: Using UGC in Business Intelligence Analysis
    Ramon Saura, Jose
    Bennett, Dag R.
    [J]. SYMMETRY-BASEL, 2019, 11 (04):
  • [8] Three-stage knowledge acquisition method
    Cao, Cungen
    Liu, Wei
    [J]. Journal of Computer Science and Technology, 1995, 10 (03): : 274 - 280
  • [9] A Three-Stage Knowledge Acquisition Method
    曹存根
    刘薇
    [J]. Journal of Computer Science & Technology, 1995, (03) : 274 - 280
  • [10] Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering
    Bharti, Kusum Kumari
    Singh, Pramod Kumar
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (06) : 3105 - 3114