An improved global feature selection scheme for text classification

被引:138
|
作者
Uysal, Alper Kursat [1 ]
机构
[1] Anadolu Univ, Dept Comp Engn, Eskisehir, Turkey
关键词
Global feature selection; Filter; Text classification; Pattern recognition; FEATURE-EXTRACTION; INFORMATION GAIN; ALGORITHM; IMPACT;
D O I
10.1016/j.eswa.2015.08.050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is known as a good solution to the high dimensionality of the feature space and mostly preferred feature selection methods for text classification are filter-based ones. In a common filter-based feature selection scheme, unique scores are assigned to features depending on their discriminative power and these features are sorted in descending order according to the scores. Then, the last step is to add top-N features to the feature set where N is generally an empirically determined number. In this paper, an improved global feature selection scheme (IGFSS) where the last step in a common feature selection scheme is modified in order to obtain a more representative feature set is proposed. Although feature set constructed by a common feature selection scheme successfully represents some of the classes, a number of classes may not be even represented. Consequently, IGFSS aims to improve the classification performance of global feature selection methods by creating a feature set representing all classes almost equally. For this purpose, a local feature selection method is used in IGFSS to label features according to their discriminative power on classes and these labels are used while producing the feature sets. Experimental results on well-known benchmark datasets with various classifiers indicate that IGFSS improves the performance of classification in terms of two widely-known metrics namely Micro-F1 and Macro-F1. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:82 / 92
页数:11
相关论文
共 50 条
  • [1] Variable Global Feature Selection Scheme for automatic classification of text documents
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 81 : 268 - 281
  • [2] Feature selection using improved mutual information for text classification
    Novovicová, J
    Malík, A
    Pudil, P
    [J]. STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 1010 - 1017
  • [3] Improved Document Feature Selection with Categorical Parameter for Text Classification
    Wang, Fen
    Li, Xiaoxuan
    Huang, Xiaotao
    Kang, Ling
    [J]. MOBILE, SECURE, AND PROGRAMMABLE NETWORKING (MSPN 2016), 2016, 10026 : 86 - 98
  • [4] Feature Selection in Text Classification
    Sahin, Durmus Ozkan
    Ates, Nurullah
    Kilic, Erdal
    [J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780
  • [5] Feature selection algorithm for text classification based on improved mutual information
    丛帅
    张积宾
    徐志明
    王宇颖
    [J]. Journal of Harbin Institute of Technology(New series), 2011, (03) : 144 - 148
  • [6] An improved method of feature selection based on concept attributes in text classification
    Liao, SS
    Jiang, MH
    [J]. ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 1140 - 1149
  • [7] Feature selection via maximizing global information gain for text classification
    Shang, Changxing
    Li, Min
    Feng, Shengzhong
    Jiang, Qingshan
    Fan, Jianping
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 54 : 298 - 309
  • [8] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [9] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [10] Feature selection for text classification: A review
    Deng, Xuelian
    Li, Yuqing
    Weng, Jian
    Zhang, Jilian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816