On the behaviour of permutation-based variable importance measures in random forest clustering

被引:6
|
作者
Nembrini, Stefano [1 ]
机构
[1] Univ Florida, Coll Med, Emerging Pathogens Inst, Dept Pathol, Gainesville, FL 32610 USA
关键词
random forest clustering; variable importance measures; variable selection;
D O I
10.1002/cem.3135
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unsupervised random forest (RF) is a popular clustering method that can be implemented by artificially creating a two-class problem. Variable importance measures (VIMs) can be used to determine which variables are relevant for defining the RF dissimilarity, but they have not received as much attention as the supervised case. Here, I show that sampling schemes used in generating the artificial data-including the original one-can influence the behaviour of the permutation importance in a way that can affect conclusions on variable relevance and also propose a solution. Generating the artificial data using a Bayesian bootstrap keeps the desirable properties of the permutation VIM.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Consistent and unbiased variable selection under indepedent features using Random Forest permutation importance
    Ramosaj, Burim
    Pauly, Markus
    BERNOULLI, 2023, 29 (03) : 2101 - 2118
  • [22] Variable Importance Measure System Based on Advanced Random Forest
    Song, Shufang
    He, Ruyang
    Shi, Zhaoyin
    Zhang, Weiya
    CMES-COMPUTER MODELING IN ENGINEERING & SCIENCES, 2021, 128 (01): : 65 - 85
  • [23] MMD-based Variable Importance for Distributional Random Forest
    Benard, Clement
    Naf, Jeffrey
    Josse, Julie
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [24] Multi-user detection for random permutation-based multiple access
    Coulon, M
    Roviras, D
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PROCEEDINGS: SIGNAL PROCESSING FOR COMMUNICATIONS SPECIAL SESSIONS, 2003, : 61 - 64
  • [25] Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures
    Nicodemus, Kristin K.
    BRIEFINGS IN BIOINFORMATICS, 2011, 12 (04) : 369 - 373
  • [26] Dissolved oxygen prediction model based on variable importance measures and random forest: A case study of Shenzhen Bay
    Yang, Ming-Yue
    Mao, Xian-Zhong
    Zhongguo Huanjing Kexue/China Environmental Science, 2022, 42 (08): : 3876 - 3881
  • [27] VARIABLE INTERACTION MEASURES WITH RANDOM FOREST CLASSIFIERS
    Kelly, Cassidy
    Okada, Kazunori
    2012 9TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2012, : 154 - 157
  • [28] Estimating neuronal variable importance with random forest
    Oh, J
    Laubach, M
    Luczak, A
    PROCEEDINGS OF THE IEEE 29TH ANNUAL NORTHEAST BIOENGINEERING CONFERENCE, 2003, : 33 - 34
  • [29] RFCell: A Gene Selection Approach for scRNA-seq Clustering Based on Permutation and Random Forest
    Zhao, Yuan
    Fang, Zhao-Yu
    Lin, Cui-Xiang
    Deng, Chao
    Xu, Yun-Pei
    Li, Hong-Dong
    FRONTIERS IN GENETICS, 2021, 12
  • [30] Random Forest Variable Importance Measures for Spatial Dynamics: Case Studies from Urban Demography
    Georgati, Marina
    Hansen, Henning Sten
    Kessler, Carsten
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (11)