Selecting critical features for data classification based on machine learning methods

被引：0

作者：

Rung-Ching Chen

Christine Dewi

Su-Wen Huang

Rezzy Eko Caraka

机构：

[1] Chaoyang University of Technology,Department of Information Management

[2] Satya Wacana Christian University,Faculty of Information Technology

[3] Office of General Affairs,undefined

[4] Taichung Veterans General Hospital Taiwan,undefined

来源：

Journal of Big Data | / 7卷

关键词：

Random Forest; Features selection; SVM; Classification; KNN; LDA;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Feature selection becomes prominent, especially in the data sets with many variables and features. It will eliminate unimportant variables and improve the accuracy as well as the performance of classification. Random Forest has emerged as a quite useful algorithm that can handle the feature selection issue even with a higher number of variables. In this paper, we use three popular datasets with a higher number of variables (Bank Marketing, Car Evaluation Database, Human Activity Recognition Using Smartphones) to conduct the experiment. There are four main reasons why feature selection is essential. First, to simplify the model by reducing the number of parameters, next to decrease the training time, to reduce overfilling by enhancing generalization, and to avoid the curse of dimensionality. Besides, we evaluate and compare each accuracy and performance of the classification model, such as Random Forest (RF), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Linear Discriminant Analysis (LDA). The highest accuracy of the model is the best classifier. Practically, this paper adopts Random Forest to select the important feature in classification. Our experiments clearly show the comparative study of the RF algorithm from different perspectives. Furthermore, we compare the result of the dataset with and without essential features selection by RF methods varImp(), Boruta, and Recursive Feature Elimination (RFE) to get the best percentage accuracy and kappa. Experimental results demonstrate that Random Forest achieves a better performance in all experiment groups.

引用

共 50 条

[1] Selecting critical features for data classification based on machine learning methods
Chen, Rung-Ching
Dewi, Christine
Huang, Su-Wen
Caraka, Rezzy Eko
[J]. JOURNAL OF BIG DATA, 2020, 7 (01)
[2] Machine Learning Methods for Fear Classification Based on Physiological Features
Petrescu, Livia
Petrescu, Catalin
Oprea, Ana
Mitrut, Oana
Moise, Gabriela
Moldoveanu, Alin
Moldoveanu, Florica
[J]. SENSORS, 2021, 21 (13)
[3] Ship Classification Based on AIS Data and Machine Learning Methods
Huang, I-Lun
Lee, Man-Chun
Nieh, Chung-Yuan
Huang, Juan-Chen
[J]. ELECTRONICS, 2024, 13 (01)
[4] Ship classification based on trajectory data with machine-learning methods
Kraus, Paul
Mohrdieck, Camilla
Schwenker, Friedhelm
[J]. 2018 19TH INTERNATIONAL RADAR SYMPOSIUM (IRS), 2018,
[5] Machine Learning Methods Based Preprocessing to Improve Categorical Data Classification
Ruiz-Chavez, Zoila
Salvador-Meneses, Jaime
Garcia-Rodriguez, Jose
[J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 297 - 304
[6] Evaluation and classification of otoneurological data with new data analysis methods based on machine learning
Siermala, Markku
Juhola, Martti
Laurikkala, Jorma
Iltanen, Kati
Kentala, Erna
Pyykkoe, Mari
[J]. INFORMATION SCIENCES, 2007, 177 (09) : 1963 - 1976
[7] DIMENSIONALITY AND NUMBER OF FEATURES IN LEARNING-MACHINE CLASSIFICATION METHODS
RITTER, GL
WOODRUFF, HB
[J]. ANALYTICAL CHEMISTRY, 1977, 49 (13) : 2116 - 2118
[8] Data Augmentation Methods for Machine-learning-based Classification of Bio-signals
Sakai, Asuka
Minoda, Yuki
Morikawa, Koji
[J]. 2017 10TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2017,
[9] Classification performance of machine learning methods in different data structures
Aglarci, Ali Vasfi
Bal, Cengiz
[J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 53 (12) : 6471 - 6489
[10] Extreme learning machine based transfer learning for data classification
Li, Xiaodong
Mao, Weijie
Jiang, Wei
[J]. NEUROCOMPUTING, 2016, 174 : 203 - 210

← 1 2 3 4 5 →