Feature selection via maximizing global information gain for text classification

被引：96

作者：

Shang, Changxing ^{[1
,2
,3
]}

Li, Min ^{[1
,2
]}

Feng, Shengzhong ^{[1
]}

Jiang, Qingshan ^{[1
]}

Fan, Jianping ^{[1
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

[2] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China

[3] Zhengzhou Inst Informat Sci & Technol, Zhengzhou 450001, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2013年 / 54卷

基金：

美国国家科学基金会;

关键词：

Feature selection; Text classification; High dimensionality; Distributional clustering; Information bottleneck;

D O I：

10.1016/j.knosys.2013.09.019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature selection is a vital preprocessing step for text classification task used to solve the curse of dimensionality problem. Most existing metrics (such as information gain) only evaluate features individually but completely ignore the redundancy between them. This can decrease the overall discriminative power because one feature's predictive power is weakened by others. On the other hand, though all higher order algorithms (such as mRMR) take redundancy into account, the high computational complexity renders them improper in the text domain. This paper proposes a novel metric called global information gain (GIG) which can avoid redundancy naturally. An efficient feature selection method called maximizing global information gain (MGIG) is also given. We compare MGIG with four other algorithms on six datasets, the experimental results show that MGIG has better results than others methods in most cases. Moreover, MGIG runs significantly faster than the traditional higher order algorithms, which makes it a proper choice for feature selection in text domain. (C) 2013 Elsevier B.V. All rights reserved.

引用

页码：298 / 309

页数：12

共 50 条

[21] Feature selection algorithm for text classification based on improved mutual information
丛帅
张积宾
徐志明
王宇颖
Journal of Harbin Institute of Technology(New series), 2011, (03) : 144 - 148
[22] A new feature selection method for handling redundant information in text classification
You-wei Wang
Li-zhou Feng
Frontiers of Information Technology & Electronic Engineering, 2018, 19 : 221 - 234
[23] A new feature selection method for handling redundant information in text classification
Wang, You-wei
Feng, Li-zhou
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2018, 19 (02) : 221 - 234
[24] Feature Selection by Maximizing Part Mutual Information
Gao, Wanfu
Hu, Liang
Zhang, Ping
2018 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND MACHINE LEARNING (SPML 2018), 2018, : 120 - 127
[25] Dynamic feature selection in text classification
Doan, Son
Horiguchi, Susumu
INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
[26] Contextual feature selection for text classification
Paradis, Francois
Nie, Jian-Yun
INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
[27] Hybrid feature selection for text classification
Gunal, Serkan
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
[28] Feature selection for text classification: A review
Deng, Xuelian
Li, Yuqing
Weng, Jian
Zhang, Jilian
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
[29] Feature Selection Strategy in Text Classification
Fung, Pui Cheong Gabriel
Morstatter, Fred
Liu, Huan
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
[30] Feature selection for text classification: A review
Xuelian Deng
Yuqing Li
Jian Weng
Jilian Zhang
Multimedia Tools and Applications, 2019, 78 : 3797 - 3816

← 1 2 3 4 5 →