Variable Global Feature Selection Scheme for automatic classification of text documents

被引:67
|
作者
Agnihotri, Deepak [1 ]
Verma, Kesari [1 ]
Tripathi, Priyanka [2 ]
机构
[1] Natl Inst Technol Raipur, Dept Comp Applicat, Raipur 492010, CG, India
[2] Natl Inst Tech Teachers Training & Res Bhopal, Deptartment Comp Engn & Applicat, Bhopal 462002, MP, India
关键词
Feature selection; Text document classification; Text mining; Text analysis;
D O I
10.1016/j.eswa.2017.03.057
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The feature selection is important to speed up the process of Automatic Text Document Classification (ATDC). At present, the most common method for discriminating feature selection is based on Global Filter-based Feature Selection Scheme (GFSS). The GFSS assigns a score to each feature based on its discriminating power and selects the top-N features from the feature set, where N is an empirically determined number. As a result, it may be possible that the features of a few classes are discarded either partially or completely. The Improved Global Feature Selection Scheme (IGFSS) solves this issue by selecting an equal number of representative features from all the classes. However, it suffers in dealing with an unbalanced dataset having large number of classes. The distribution of features in these classes are highly variable. In this case, if an equal number of features are chosen from each class, it may exclude some important features from the class containing a higher number of features. To overcome this problem, we propose a novel Variable Global Feature Selection Scheme (VGFSS) to select a variable number of features from each class based on the distribution of terms in the classes. It ensures that, a minimum number of terms are selected from each class. The numerical results on benchmark datasets show the effectiveness of the proposed algorithm VGFSS over classical information science methods and IGFSS. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:268 / 281
页数:14
相关论文
共 50 条
  • [1] An improved global feature selection scheme for text classification
    Uysal, Alper Kursat
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 43 : 82 - 92
  • [2] Feature selection and text classification for Chinese web documents
    Xu, JC
    Liu, DY
    Hu, M
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1304 - 1309
  • [3] Research on Feature Selection Method in Chinese Text Automatic Classification
    Hong, Ying
    Shao, Xiwen
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 1759 - 1763
  • [4] Research on feature selection method in Chinese text automatic classification
    Hong, Ying
    Geng, Zengmin
    [J]. ENERGY SCIENCE AND APPLIED TECHNOLOGY, 2016, : 359 - 361
  • [5] An Enhanced Feature Selection for Text Documents
    Thatha, Venkata Nagaraju
    Babu, A. Sudhir
    Haritha, D.
    [J]. SMART INTELLIGENT COMPUTING AND APPLICATIONS, VOL 2, 2020, 160 : 21 - 29
  • [6] Using micro-documents for feature selection: The case of ordinal text classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (11) : 4687 - 4696
  • [7] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [8] Automatic Genre Classification of Web Documents Using Discriminant Analysis for Feature Selection
    Maeda, Akira
    Hayashi, Yukinori
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON THE APPLICATIONS OF DIGITAL INFORMATION AND WEB TECHNOLOGIES (ICADIWT 2009), 2009, : 405 - +
  • [9] Feature selection via maximizing global information gain for text classification
    Shang, Changxing
    Li, Min
    Feng, Shengzhong
    Jiang, Qingshan
    Fan, Jianping
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 54 : 298 - 309
  • [10] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675