Selecting critical features for data classification based on machine learning methods

被引:0
|
作者
Rung-Ching Chen
Christine Dewi
Su-Wen Huang
Rezzy Eko Caraka
机构
[1] Chaoyang University of Technology,Department of Information Management
[2] Satya Wacana Christian University,Faculty of Information Technology
[3] Office of General Affairs,undefined
[4] Taichung Veterans General Hospital Taiwan,undefined
来源
关键词
Random Forest; Features selection; SVM; Classification; KNN; LDA;
D O I
暂无
中图分类号
学科分类号
摘要
Feature selection becomes prominent, especially in the data sets with many variables and features. It will eliminate unimportant variables and improve the accuracy as well as the performance of classification. Random Forest has emerged as a quite useful algorithm that can handle the feature selection issue even with a higher number of variables. In this paper, we use three popular datasets with a higher number of variables (Bank Marketing, Car Evaluation Database, Human Activity Recognition Using Smartphones) to conduct the experiment. There are four main reasons why feature selection is essential. First, to simplify the model by reducing the number of parameters, next to decrease the training time, to reduce overfilling by enhancing generalization, and to avoid the curse of dimensionality. Besides, we evaluate and compare each accuracy and performance of the classification model, such as Random Forest (RF), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Linear Discriminant Analysis (LDA). The highest accuracy of the model is the best classifier. Practically, this paper adopts Random Forest to select the important feature in classification. Our experiments clearly show the comparative study of the RF algorithm from different perspectives. Furthermore, we compare the result of the dataset with and without essential features selection by RF methods varImp(), Boruta, and Recursive Feature Elimination (RFE) to get the best percentage accuracy and kappa. Experimental results demonstrate that Random Forest achieves a better performance in all experiment groups.
引用
收藏
相关论文
共 50 条
  • [1] Selecting critical features for data classification based on machine learning methods
    Chen, Rung-Ching
    Dewi, Christine
    Huang, Su-Wen
    Caraka, Rezzy Eko
    [J]. JOURNAL OF BIG DATA, 2020, 7 (01)
  • [2] Machine Learning Methods for Fear Classification Based on Physiological Features
    Petrescu, Livia
    Petrescu, Catalin
    Oprea, Ana
    Mitrut, Oana
    Moise, Gabriela
    Moldoveanu, Alin
    Moldoveanu, Florica
    [J]. SENSORS, 2021, 21 (13)
  • [3] Ship Classification Based on AIS Data and Machine Learning Methods
    Huang, I-Lun
    Lee, Man-Chun
    Nieh, Chung-Yuan
    Huang, Juan-Chen
    [J]. ELECTRONICS, 2024, 13 (01)
  • [4] Ship classification based on trajectory data with machine-learning methods
    Kraus, Paul
    Mohrdieck, Camilla
    Schwenker, Friedhelm
    [J]. 2018 19TH INTERNATIONAL RADAR SYMPOSIUM (IRS), 2018,
  • [5] Machine Learning Methods Based Preprocessing to Improve Categorical Data Classification
    Ruiz-Chavez, Zoila
    Salvador-Meneses, Jaime
    Garcia-Rodriguez, Jose
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I, 2018, 11314 : 297 - 304
  • [6] Evaluation and classification of otoneurological data with new data analysis methods based on machine learning
    Siermala, Markku
    Juhola, Martti
    Laurikkala, Jorma
    Iltanen, Kati
    Kentala, Erna
    Pyykkoe, Mari
    [J]. INFORMATION SCIENCES, 2007, 177 (09) : 1963 - 1976
  • [7] DIMENSIONALITY AND NUMBER OF FEATURES IN LEARNING-MACHINE CLASSIFICATION METHODS
    RITTER, GL
    WOODRUFF, HB
    [J]. ANALYTICAL CHEMISTRY, 1977, 49 (13) : 2116 - 2118
  • [8] Data Augmentation Methods for Machine-learning-based Classification of Bio-signals
    Sakai, Asuka
    Minoda, Yuki
    Morikawa, Koji
    [J]. 2017 10TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2017,
  • [9] Classification performance of machine learning methods in different data structures
    Aglarci, Ali Vasfi
    Bal, Cengiz
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 53 (12) : 6471 - 6489
  • [10] Extreme learning machine based transfer learning for data classification
    Li, Xiaodong
    Mao, Weijie
    Jiang, Wei
    [J]. NEUROCOMPUTING, 2016, 174 : 203 - 210