A Comprehensive Study of Eleven Feature Selection Algorithms and their Impact on Text Classification

被引：0

作者：

Vora, Suchi ^{[1
]}

Yang, Hui ^{[1
]}

机构：

[1] San Francisco State Univ, Dept Comp Sci, San Francisco, CA 94132 USA

来源：

2017 COMPUTING CONFERENCE | 2017年

关键词：

feature selection/ranking algorithms; classification algorithms; comparison and evaluation;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Feature selection has been routinely used as a preprocessing step to remove irrelevant features and conquer the "curse of dimensionality". In contrast to dimensionality reduction techniques such as PCA, the resulting features from feature selection are selected from the original feature space; hence, easy to interpret. A large host of feature selection algorithms has been proposed in the literature. This has created a critical issue: which algorithm should one use? Moreover, how does a feature selection method affect the performance of a given classification algorithm? This paper addresses these issues by (1) presenting an open source software system that integrates eleven feature selection algorithms and five common classifiers; and (2) systematically comparing and evaluating the selected features and their impact over these five classifiers using five datasets. Specifically, this system includes ten commonly adopted filter-based feature selection algorithms: ChiSquare, Information Gain, Fisher Score, Gini Index, Kruskal-Wallis, Laplacian Score, ReliefF, FCBF, CFS, and mRmR. It also includes one state-of-the-art embedded approach built upon Random Forests. The five classifiers are SVM, Random Forests, Naive Bayes, kNN and C4.5 Decision Tree. Comprehensive evaluations consisting of around 1000 experiments were conducted over five text datasets. Several approximately equivalent groups (AEG), where algorithms in the same group select highly similar features, have been identified. Suitable feature-selection-classifier combinations have also been identified. For example, Chi-square and Information Gain form an AEG. Furthermore, Gini Index or Kruskal-Wallis together with SVM often produces classification performance that is comparable with or better than using all the original features. Such results will provide empirical guidelines for the data analytic community.

引用

页码：440 / 449

页数：10

共 50 条

[1] Impact of feature selection techniques in Text Classification: An Experimental study
Basha, S. Rahamat
Rani, J. Keziya
Yadav, J. J. C. Prasad
Kumar, G. Ravi
[J]. JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, : 39 - 51
[2] A Comprehensive Study of Text Classification Algorithms
Vijayan, Vikas K.
Bindu, K. R.
Parameswaran, Latha
[J]. 2017 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2017, : 1109 - 1113
[3] Feature Selection For Text Classification Using Genetic Algorithms
Bidi, Noria
Elberrichi, Zakaria
[J]. PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION & CONTROL (ICMIC 2016), 2016, : 806 - 810
[4] Different Classification Algorithms Based on Arabic Text Classification: Feature Selection Comparative Study
Raho, Ghazi
Al-Shalabi, Riyad
Kanaan, Ghassan
Asma'aNassar
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2015, 6 (02) : 192 - 195
[5] Information-theoretic feature selection algorithms for text classification
Novovicová, J
Malík, A
[J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 3272 - 3277
[6] Impact of Feature Selection and Engineering in the Classification of Handwritten Text
Kaushik, Anupama
Gupta, Himanshu
Latwal, Digvijay Singh
[J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2598 - 2601
[7] A COMPREHENSIVE EVALUATION OF FEATURE SELECTION ALGORITHMS IN HYPERSPECTRAL IMAGE CLASSIFICATION
Vijouyeh, Hamed G.
Taskin, Gulsen
[J]. 2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 489 - 492
[8] Ensemble feature selection for single-label text classification: a comprehensive analytical study
Bekir Parlak
[J]. Neural Computing and Applications, 2023, 35 : 19235 - 19251
[9] Ensemble feature selection for single-label text classification: a comprehensive analytical study
Parlak, Bekir
[J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (26): : 19235 - 19251
[10] Feature Selection in Text Classification
Sahin, Durmus Ozkan
Ates, Nurullah
Kilic, Erdal
[J]. 2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1777 - 1780

← 1 2 3 4 5 →