A tree approach for variable selection and its random forest

被引:0
|
作者
Liu, Yu [1 ]
Qin, Xu [1 ]
Cai, Zhibo [2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Math Sci, 2006 Xiyuan Ave, Chengdu 611731, Sichuan, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[3] Renmin Univ China, Sch Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary partition; Classification and regression tree; Mutual information; Random forests; Sure independence screening; MUTUAL INFORMATION; MODELS;
D O I
10.1016/j.csda.2024.108068
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its "sure screening property". Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] A comparison of random forest variable selection methods for regression modeling of continuous outcomes
    O'Connell, Nathaniel S.
    Jaeger, Byron C.
    Bullock, Garrett S.
    Speiser, Jaime Lynn
    BRIEFINGS IN BIOINFORMATICS, 2025, 26 (02)
  • [22] Random Forest Tree Based Approach for Blast Design in Surface Mine
    Mishra A.K.
    Ramteke S.V.
    Sen P.
    Verma A.K.
    Geotechnical and Geological Engineering, 2018, 36 (3) : 1647 - 1664
  • [23] A Systematic Approach for Variable Selection With Random Forests: Achieving Stable Variable Importance Values
    Behnamian, Amir
    Millard, Koreen
    Banks, Sarah N.
    White, Lori
    Richardson, Murray
    Pasher, Jon
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2017, 14 (11) : 1988 - 1992
  • [24] An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests
    Fawagreh, Khaled
    Gaber, Mohamed Medhat
    Elyan, Eyad
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2016, 2016, 629 : 267 - 282
  • [25] A random forest algorithm under the ensemble approach for feature selection and classification
    Kharwar, Ankit
    Thakor, Devendra
    INTERNATIONAL JOURNAL OF COMMUNICATION NETWORKS AND DISTRIBUTED SYSTEMS, 2023, 29 (04) : 426 - 447
  • [26] A Guided Random Forest based Feature Selection Approach for Activity Recognition
    Uddin, Md. Taufeeq
    Uddin, Md. Azher
    2ND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION COMMUNICATION TECHNOLOGY (ICEEICT 2015), 2015,
  • [27] Improved random forest classification approach based on hybrid clustering selection
    Yuan, Dong
    Huang, Jian
    Yang, Xu
    Cui, Jiarui
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1559 - 1563
  • [28] Design of a Database-Driven Modeling based on Variable Selection using a Random Forest
    Imaji, Hiromu
    Kinoshita, Takuya
    Yamamoto, Toru
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 215 - 220
  • [29] A combination strategy of random forest and back propagation network for variable selection in spectral calibration
    Chen, Huazhou
    Liu, Xiaoke
    Jia, Zhen
    Liu, Zhenyao
    Shi, Kai
    Cai, Ken
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2018, 182 : 101 - 108
  • [30] A Classification Study of Respiratory Syncytial Virus (RSV) Inhibitors by Variable Selection with Random Forest
    Hao, Ming
    Li, Yan
    Wang, Yonghua
    Zhang, Shuwei
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2011, 12 (02) : 1259 - 1280