Selecting critical features for data classification based on machine learning methods

被引:0
|
作者
Rung-Ching Chen
Christine Dewi
Su-Wen Huang
Rezzy Eko Caraka
机构
[1] Chaoyang University of Technology,Department of Information Management
[2] Satya Wacana Christian University,Faculty of Information Technology
[3] Office of General Affairs,undefined
[4] Taichung Veterans General Hospital Taiwan,undefined
来源
关键词
Random Forest; Features selection; SVM; Classification; KNN; LDA;
D O I
暂无
中图分类号
学科分类号
摘要
Feature selection becomes prominent, especially in the data sets with many variables and features. It will eliminate unimportant variables and improve the accuracy as well as the performance of classification. Random Forest has emerged as a quite useful algorithm that can handle the feature selection issue even with a higher number of variables. In this paper, we use three popular datasets with a higher number of variables (Bank Marketing, Car Evaluation Database, Human Activity Recognition Using Smartphones) to conduct the experiment. There are four main reasons why feature selection is essential. First, to simplify the model by reducing the number of parameters, next to decrease the training time, to reduce overfilling by enhancing generalization, and to avoid the curse of dimensionality. Besides, we evaluate and compare each accuracy and performance of the classification model, such as Random Forest (RF), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Linear Discriminant Analysis (LDA). The highest accuracy of the model is the best classifier. Practically, this paper adopts Random Forest to select the important feature in classification. Our experiments clearly show the comparative study of the RF algorithm from different perspectives. Furthermore, we compare the result of the dataset with and without essential features selection by RF methods varImp(), Boruta, and Recursive Feature Elimination (RFE) to get the best percentage accuracy and kappa. Experimental results demonstrate that Random Forest achieves a better performance in all experiment groups.
引用
下载
收藏
相关论文
共 50 条
  • [21] INVESTIGATIONS ON CLASSIFICATION METHODS FOR LOAN APPLICATION BASED ON MACHINE LEARNING
    Wu, Mingli
    Huang, Yafei
    Duan, Jianyong
    PROCEEDINGS OF 2019 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), 2019, : 541 - 546
  • [22] Classification Models for Acetylcholinesterase Inhibitors Based on Machine Learning Methods
    Yang Guo-Bing
    Li Ze-Rong
    Rao Han-Bing
    Li Xiang-Yuan
    Chen Yu-Zong
    ACTA PHYSICO-CHIMICA SINICA, 2010, 26 (12) : 3351 - 3359
  • [23] Machine Learning Methods for Identifying Critical Data Elements in Nursing Documentation
    Bose, Eliezer
    Maganti, Sasank
    Bowles, Kathryn H.
    Brueshoff, Bonnie L.
    Monsen, Karen A.
    NURSING RESEARCH, 2019, 68 (01) : 65 - 72
  • [24] Adaptive features of machine learning methods
    Berka, P
    2002 FIRST INTERNATIONAL IEEE SYMPOSIUM INTELLIGENT SYSTEMS, VOL II, EUNITE INVITED SESSION, PROCEEDINGS, 2002, : 40 - 43
  • [25] Efficient Features Selection based Breast Tumors Classification with Machine Learning
    Rahman, Jalees Ur
    Ishtiaq, Amna
    Haider, Usman
    2022 17TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET'22), 2022, : 166 - 171
  • [26] Phishing Web Sites Features Classification Based on Extreme Learning Machine
    Sonmez, Yasin
    Tuncer, Turker
    Gokal, Huseyin
    Avci, Engin
    2018 6TH INTERNATIONAL SYMPOSIUM ON DIGITAL FORENSIC AND SECURITY (ISDFS), 2018, : 155 - 159
  • [27] A Machine Learning Approach for Subjectivity Classification Based on Positional and Discourse Features
    Chenlo, Jose M.
    Losada, David E.
    MULTIDISCIPLINARY INFORMATION RETRIEVAL, 2013, 8201 : 17 - 28
  • [28] Epileptic EEG classification based on extreme learning machine and nonlinear features
    Yuan, Qi
    Zhou, Weidong
    Li, Shufang
    Cai, Dongmei
    EPILEPSY RESEARCH, 2011, 96 (1-2) : 29 - 38
  • [29] Research on the Features of Car Insurance Data Based on Machine Learning
    Wang, Hui Dong
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MECHATRONICS AND INTELLIGENT ROBOTICS (ICMIR-2019), 2020, 166 : 582 - 587
  • [30] Incomplete data classification with voting based extreme learning machine
    Yan, Yuan-Ting
    Zhang, Yan-Ping
    Chen, Jie
    Zhang, Yi-Wen
    NEUROCOMPUTING, 2016, 193 : 167 - 175