A review of random forest-based feature selection methods for data science education and applications

被引:1
|
作者
Iranzad, Reza [1 ]
Liu, Xiao [2 ]
机构
[1] FedEx Express, Memphis, TN USA
[2] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA
基金
美国国家科学基金会;
关键词
Random forest; Feature selection; Feature importance; Classification; Data science education; Data science consulting projects; Capstone projects; VARIABLE SELECTION; GENE SELECTION;
D O I
10.1007/s41060-024-00509-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random forest (RF) is one of the most popular statistical learning methods in both data science education and applications. Feature selection, enabled by RF, is often among the very first tasks in a data science project, such as the college capstone project, industry consulting projects. The goal of this paper is to provide a comprehensive review of 12 RF-based feature selection methods for classification problems. The review provides necessary description of each method and the software packages. We show that different methods typically do not provide consistent feature selection results, and the model performance also varies when different RF-based feature selection approaches are employed. This observation suggests that caution must be taken when performing feature selection tasks using RF. Feature selection cannot be blindly done without a sound understanding of the methods adopted, which is not always the case in industry and many senior capstone projects that we have observed. The paper serves as a one-stop reference where students, data science consultants, engineers, and data scientists can access the basic ideas behind these methods, the advantages and limitations of different approaches, as well as the software packages to implement these methods.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Random Forest-based feature selection for emotion recognition
    Gharsalli, Sonia
    Emile, Bruno
    Laurent, Helene
    Desquesnes, Xavier
    Vivet, Damien
    [J]. 5TH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, THEORY, TOOLS AND APPLICATIONS 2015, 2015, : 268 - 272
  • [2] Robustness of Random Forest-based gene selection methods
    Kursa, Miron Bartosz
    [J]. BMC BIOINFORMATICS, 2014, 15
  • [3] Robustness of Random Forest-based gene selection methods
    Miron Bartosz Kursa
    [J]. BMC Bioinformatics, 15
  • [4] Research on the Application of Random Forest-based Feature Selection Algorithm in Data Mining Experiments
    Wang, Huan
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 505 - 518
  • [5] Improving Landslides Prediction: Meteorological Data Preprocessing Using Random Forest-Based Feature Selection
    Guerrero Rodriguez, Byron
    Salvador Meneses, Jaime
    Garcia-Rodriguez, Jose
    [J]. 16TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING MODELS IN INDUSTRIAL AND ENVIRONMENTAL APPLICATIONS (SOCO 2021), 2022, 1401 : 379 - 387
  • [6] Random forest-based feature selection and detection method for drunk driving recognition
    Li, ZhenLong
    Wang, HaoXin
    Zhang, YaoWei
    Zhao, XiaoHua
    [J]. INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2020, 16 (02)
  • [7] Research on Feature Selection Methods based on Random Forest
    Wang, Zhuo
    [J]. TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2023, 30 (02): : 623 - 633
  • [8] A Model-Free Feature Selection Technique of Feature Screening and Random Forest-Based Recursive Feature Elimination
    Xia, Siwei
    Yang, Yuehan
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2023, 2023
  • [9] Research and performance analysis of random forest-based feature selection algorithm in sports effectiveness evaluation
    Yujiao Li
    Yingjie Mu
    [J]. Scientific Reports, 14 (1)
  • [10] A review of feature selection methods with applications
    Jovic, A.
    Brkic, K.
    Bogunovic, N.
    [J]. 2015 8TH INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2015, : 1200 - 1205