A tree approach for variable selection and its random forest

被引:0
|
作者
Liu, Yu [1 ]
Qin, Xu [1 ]
Cai, Zhibo [2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Math Sci, 2006 Xiyuan Ave, Chengdu 611731, Sichuan, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[3] Renmin Univ China, Sch Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary partition; Classification and regression tree; Mutual information; Random forests; Sure independence screening; MUTUAL INFORMATION; MODELS;
D O I
10.1016/j.csda.2024.108068
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Sure Independence Screening (SIS) provides a fast and efficient ranking for the importance of variables for ultra-high dimensional regressions. However, classical SIS cannot eliminate false importance in the ranking, which is exacerbated in nonparametric settings. To address this problem, a novel screening approach is proposed by partitioning the sample into subsets sequentially and creating a tree-like structure of sub-samples called SIS-tree. SIS-tree is straightforward to implement and can be integrated with various measures of dependence. Theoretical results are established to support this approach, including its "sure screening property". Additionally, SIS-tree is extended to a forest with improved performance. Through simulations, the proposed methods are demonstrated to have great improvement comparing with existing SIS methods. The selection of a cutoff for the screening is also investigated through theoretical justification and experimental study. As a direct application, classifications of high-dimensional data are considered, and it is found that the screening and cutoff can substantially improve the performance of existing classifiers.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] PREDICTION OF PIVOTAL RESPONSE TREATMENT OUTCOME WITH TASK FMRI USING RANDOM FOREST AND VARIABLE SELECTION
    Zhuang, Juntang
    Dvornek, Nicha C.
    Li, Xiaoxiao
    Yang, Daniel
    Ventola, Pamela
    Duncan, James S.
    2018 IEEE 15TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2018), 2018, : 97 - 100
  • [42] A random forest-based selection of optically variable AGN in the VST-COSMOS field
    De Cicco, D.
    Bauer, F. E.
    Paolillo, M.
    Cavuoti, S.
    Sanchez-Saez, P.
    Brandt, W. N.
    Pignata, G.
    Vaccari, M.
    Radovich, M.
    ASTRONOMY & ASTROPHYSICS, 2021, 645
  • [43] Using Random Forest with Improved Variable Selection to Predict the Compressive Strength of Concrete with Lithium Slag
    Wei L.
    Huang L.
    Zeng L.
    Cailiao Daobao/Materials Reports, 2024, 38 (09):
  • [44] Variable Selection Using Mean Decrease Accuracy And Mean Decrease Gini Based on Random Forest
    Han, Hong
    Guo, Xiaoling
    Yu, Hua
    PROCEEDINGS OF 2016 IEEE 7TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2016), 2016, : 219 - 224
  • [45] Random forest variable selection in spatial malaria transmission modelling in Mpumalanga Province, South Africa
    Kapwata, Thandi
    Gebreslasie, Michael T.
    GEOSPATIAL HEALTH, 2016, 11 (03) : 251 - 262
  • [46] Discrimination of cracked soybean seeds by near-infrared spectroscopy and random forest variable selection
    Wang, Liusan
    Huang, Ziliang
    Wang, Rujing
    INFRARED PHYSICS & TECHNOLOGY, 2021, 115
  • [47] Variable selection using random forests
    Genuer, Robin
    Poggi, Jean-Michel
    Tuleau-Malot, Christine
    PATTERN RECOGNITION LETTERS, 2010, 31 (14) : 2225 - 2236
  • [48] The random simulation algorithm for variable selection
    Zhang, Shangli
    Ke, Zhenglin
    Wei, Gongding
    Zhang, Lili
    Journal of Information and Computational Science, 2012, 9 (17): : 5119 - 5125
  • [49] Variable selection using random forests
    Sandri, Marco
    Zuccolotto, Paola
    DATA ANALYSIS, CLASSIFICATION AND THE FORWARD SEARCH, 2006, : 263 - +
  • [50] On Dynamic Selection of Subspace for Random Forest
    Adnan, Md Nasim
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2014, 2014, 8933 : 370 - 379