A New Feature Sampling Method in Random Forests for Predicting High-Dimensional Data

被引:7
|
作者
Thanh-Tung Nguyen [1 ]
Zhao, He [2 ]
Huang, Joshua Zhexue [3 ]
Thuy Thi Nguyen [4 ]
Li, Mark Junjie [3 ]
机构
[1] Thuyloi Univ, Fac Comp Sci & Engn, Hanoi, Vietnam
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China
[4] Vietnam Natl Univ Agr, Fac Informat Technol, Hanoi, Vietnam
关键词
Subspace feature selection; Regression; Classification; Random forests; Data mining; High-dimensional data; SELECTION;
D O I
10.1007/978-3-319-18032-8_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Random Forests (RF) models have been proven to perform well in both classification and regression. However, with the randomizing mechanism in both bagging samples and feature selection, the performance of RF can deteriorate when applied to high-dimensional data. In this paper, we propose a new approach for feature sampling for RF to deal with high-dimensional data. We first apply p-value to assess the feature importance on finding a cut-off between informative and less informative features. The set of informative features is then further partitioned into two groups, highly informative and informative features, using some statistical measures. When sampling the feature subspace for learning RFs, features from the three groups are taken into account. The new subspace sampling method maintains the diversity and the randomness of the forest and enables one to generate trees with a lower prediction error. In addition, quantile regression is employed to obtain predictions in the regression problem for a robustness towards outliers. The experimental results demonstrated that the proposed approach for learning random forests significantly reduced prediction errors and outperformed most existing random forests when dealing with high-dimensional data.
引用
收藏
页码:459 / 470
页数:12
相关论文
共 50 条
  • [31] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [32] Feature selection for high-dimensional temporal data
    Tsagris, Michail
    Lagani, Vincenzo
    Tsamardinos, Ioannis
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [33] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [34] Feature selection for high-dimensional temporal data
    Michail Tsagris
    Vincenzo Lagani
    Ioannis Tsamardinos
    [J]. BMC Bioinformatics, 19
  • [35] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    [J]. ECTA 2011/FCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON EVOLUTIONARY COMPUTATION THEORY AND APPLICATIONS AND INTERNATIONAL CONFERENCE ON FUZZY COMPUTATION THEORY AND APPLICATIONS, 2011,
  • [36] Predicting Health Outcomes from High-Dimensional Longitudinal Health Histories Using Relational Random Forests
    Shahn, Zach
    Ryan, Patrick
    Madigan, David
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2015, 8 (02) : 128 - 136
  • [37] Sequential random k-nearest neighbor feature selection for high-dimensional data
    Park, Chan Hee
    Kim, Seoung Bum
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (05) : 2336 - 2342
  • [38] Stratified feature sampling method for ensemble clustering of high dimensional data
    Jing, Liping
    Tian, Kuang
    Huang, Joshua Z.
    [J]. PATTERN RECOGNITION, 2015, 48 (11) : 3688 - 3702
  • [39] A Feature Grouping Method for Ensemble Clustering of High-Dimensional Genomic Big Data
    Farid, Dewan Md.
    Nowe, Ann
    Manderick, Bernard
    [J]. PROCEEDINGS OF 2016 FUTURE TECHNOLOGIES CONFERENCE (FTC), 2016, : 260 - 268
  • [40] An ensemble feature selection method for high-dimensional data based on sort aggregation
    Wang, Jie
    Xu, Jing
    Zhao, Chengan
    Peng, Yan
    Wang, Hongpeng
    [J]. SYSTEMS SCIENCE & CONTROL ENGINEERING, 2019, 7 (02) : 32 - 39