Feature selection via maximizing global information gain for text classification

被引：96

作者：

Shang, Changxing ^{[1
,2
,3
]}

Li, Min ^{[1
,2
]}

Feng, Shengzhong ^{[1
]}

Jiang, Qingshan ^{[1
]}

Fan, Jianping ^{[1
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen 518055, Peoples R China

[2] Chinese Acad Sci, Grad Sch, Beijing 100080, Peoples R China

[3] Zhengzhou Inst Informat Sci & Technol, Zhengzhou 450001, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2013年 / 54卷

基金：

美国国家科学基金会;

关键词：

Feature selection; Text classification; High dimensionality; Distributional clustering; Information bottleneck;

D O I：

10.1016/j.knosys.2013.09.019

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Feature selection is a vital preprocessing step for text classification task used to solve the curse of dimensionality problem. Most existing metrics (such as information gain) only evaluate features individually but completely ignore the redundancy between them. This can decrease the overall discriminative power because one feature's predictive power is weakened by others. On the other hand, though all higher order algorithms (such as mRMR) take redundancy into account, the high computational complexity renders them improper in the text domain. This paper proposes a novel metric called global information gain (GIG) which can avoid redundancy naturally. An efficient feature selection method called maximizing global information gain (MGIG) is also given. We compare MGIG with four other algorithms on six datasets, the experimental results show that MGIG has better results than others methods in most cases. Moreover, MGIG runs significantly faster than the traditional higher order algorithms, which makes it a proper choice for feature selection in text domain. (C) 2013 Elsevier B.V. All rights reserved.

引用

页码：298 / 309

页数：12

共 50 条

[41] Comparison on Feature Selection Methods for Text Classification
Liu, Wenkai
Xiao, Jiongen
Hong, Ming
2020 THE 4TH INTERNATIONAL CONFERENCE ON MANAGEMENT ENGINEERING, SOFTWARE ENGINEERING AND SERVICE SCIENCES (ICMSS 2020), 2020, : 82 - 86
[42] Efficient Method for Feature Selection in Text Classification
Sun, Jian
Zhang, Xiang
Liao, Dan
Chang, Victor
2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
[43] A Bayesian feature selection paradigm for text classification
Feng, Guozhong
Guo, Jianhua
Jing, Bing-Yi
Hao, Lizhu
INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (02) : 283 - 302
[44] A new feature selection method for text classification
Uchyigit, Gulden
Clark, Keith
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
[45] Text feature selection method for hierarchical classification
Zhu, Cui-Ling
Ma, Jun
Zhang, Dong-Mei
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 103 - 110
[46] Feature Selection Method of Text Tendency Classification
Li, Yanling
Dai, Guanzhong
Li, Gang
FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
[47] An enhanced feature selection method for text classification
Kang, Jinbeom
Lee, Eunshil
Hong, Kwanghee
Park, Jeahyun
Kim, Taehwan
Park, Juyoung
Choi, Joongmin
Yang, Jaeyoung
PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
[48] Effective feature selection technique for text classification
Seetha, Hari
Murty, M. Narasimha
Saravanan, R.
INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2015, 7 (03) : 165 - 184
[49] A feature selection and classification technique for text categorization
Girgis, MR
Aly, AA
INTERNATIONAL JOURNAL OF COOPERATIVE INFORMATION SYSTEMS, 2003, 12 (04) : 441 - 454
[50] Feature selection improves text classification accuracy
不详
IEEE INTELLIGENT SYSTEMS, 2005, 20 (06) : 75 - 75

← 1 2 3 4 5 →