Feature selection via maximizing global information gain for text classification

被引:96
|
作者
Shang, Changxing [1 ,2 ,3 ]
Li, Min [1 ,2 ]
Feng, Shengzhong [1 ]
Jiang, Qingshan [1 ]
Fan, Jianping [1 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China
[2] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China
[3] Zhengzhou Inst Informat Sci & Technol, Zhengzhou 450001, Peoples R China
基金
美国国家科学基金会;
关键词
Feature selection; Text classification; High dimensionality; Distributional clustering; Information bottleneck;
D O I
10.1016/j.knosys.2013.09.019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is a vital preprocessing step for text classification task used to solve the curse of dimensionality problem. Most existing metrics (such as information gain) only evaluate features individually but completely ignore the redundancy between them. This can decrease the overall discriminative power because one feature's predictive power is weakened by others. On the other hand, though all higher order algorithms (such as mRMR) take redundancy into account, the high computational complexity renders them improper in the text domain. This paper proposes a novel metric called global information gain (GIG) which can avoid redundancy naturally. An efficient feature selection method called maximizing global information gain (MGIG) is also given. We compare MGIG with four other algorithms on six datasets, the experimental results show that MGIG has better results than others methods in most cases. Moreover, MGIG runs significantly faster than the traditional higher order algorithms, which makes it a proper choice for feature selection in text domain. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:298 / 309
页数:12
相关论文
共 50 条
  • [1] Feature Selection by Maximizing Independent Classification Information
    Wang, Jun
    Wei, Jin-Mao
    Yang, Zhenglu
    Wang, Shu-Qin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (04) : 828 - 841
  • [2] An improved global feature selection scheme for text classification
    Uysal, Alper Kursat
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 43 : 82 - 92
  • [3] Feature selection for monotonic classification via maximizing monotonic dependency
    Weiwei Pan
    Qinghua Hu
    Yanping Song
    Daren Yu
    International Journal of Computational Intelligence Systems, 2014, 7 : 543 - 555
  • [4] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [5] Feature selection for monotonic classification via maximizing monotonic dependency
    Pan, Weiwei
    Hu, Qinghua
    Song, Yanping
    Yu, Daren
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2014, 7 (03) : 543 - 555
  • [6] Feature selection in text classification via SVM and LSI
    Wang, Ziqiang
    Zhang, Dexian
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1381 - 1386
  • [7] Bayes Theorem and Information Gain Based Feature Selection for Maximizing the Performance of Classifiers
    Appavu, Subramanian
    Rajaram, Ramasamy
    Nagammai, M.
    Priyanga, N.
    Priyanka, S.
    ADVANCES IN COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, PT I, 2011, 131 : 501 - 511
  • [8] Feature selection using improved mutual information for text classification
    Novovicová, J
    Malík, A
    Pudil, P
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 1010 - 1017
  • [9] Information-theoretic feature selection algorithms for text classification
    Novovicová, J
    Malík, A
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 3272 - 3277
  • [10] Improved Information Gain-based Feature Selection for Text Categorization
    Gao, Zhe
    Xu, Yajing
    Meng, Fanyu
    Qi, Feng
    Lin, Zhiqing
    2014 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, VEHICULAR TECHNOLOGY, INFORMATION THEORY AND AEROSPACE & ELECTRONIC SYSTEMS (VITAE), 2014,