HDSI: High dimensional selection with interactions algorithm on feature selection and testing

被引:12
|
作者
Jain, Rahi [1 ]
Xu, Wei [1 ,2 ]
机构
[1] Princess Margaret Canc Res Ctr, Biostat Dept, Toronto, ON, Canada
[2] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
来源
PLOS ONE | 2021年 / 16卷 / 02期
基金
加拿大自然科学与工程研究理事会;
关键词
RANDOM SUBSPACE METHOD; VARIABLE SELECTION; REGRESSION; LASSO; REGULARIZATION; MODELS; SHRINKAGE;
D O I
10.1371/journal.pone.0246159
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Feature selection on high dimensional data along with the interaction effects is a critical challenge for classical statistical learning techniques. Existing feature selection algorithms such as random LASSO leverages LASSO capability to handle high dimensional data. However, the technique has two main limitations, namely the inability to consider interaction terms and the lack of a statistical test for determining the significance of selected features. This study proposes a High Dimensional Selection with Interactions (HDSI) algorithm, a new feature selection method, which can handle high-dimensional data, incorporate interaction terms, provide the statistical inferences of selected features and leverage the capability of existing classical statistical techniques. The method allows the application of any statistical technique like LASSO and subset selection on multiple bootstrapped samples; each contains randomly selected features. Each bootstrap data incorporates interaction terms for the randomly sampled features. The selected features from each model are pooled and their statistical significance is determined. The selected statistically significant features are used as the final output of the approach, whose final coefficients are estimated using appropriate statistical techniques. The performance of HDSI is evaluated using both simulated data and real studies. In general, HDSI outperforms the commonly used algorithms such as LASSO, subset selection, adaptive LASSO, random LASSO and group LASSO.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] RHDSI: A novel dimensionality reduction based algorithm on high dimensional feature selection with interactions
    Jain, Rahi
    Xu, Wei
    [J]. INFORMATION SCIENCES, 2021, 574 : 590 - 605
  • [2] Overview Of Feature Subset Selection Algorithm For High Dimensional Data
    Gandhi, Swati S.
    Prabhune, S. S.
    [J]. PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2017), 2017, : 618 - 623
  • [3] Multiobjective optimization algorithm with dynamic operator selection for feature selection in high-dimensional classification
    Wei, Wenhong
    Xuan, Manlin
    Li, Lingjie
    Lin, Qiuzhen
    Ming, Zhong
    Coello, Carlos A. Coello
    [J]. APPLIED SOFT COMPUTING, 2023, 143
  • [4] A New Evolutionary Multitasking Algorithm for High-Dimensional Feature Selection
    Liu, Ping
    Xu, Bangxin
    Xu, Wenwen
    [J]. IEEE ACCESS, 2024, 12 : 89856 - 89872
  • [5] A two-stage clonal selection algorithm for local feature selection on high-dimensional data
    Wang, Yi
    Tian, Hao
    Li, Tao
    Liu, Xiaojie
    [J]. INFORMATION SCIENCES, 2024, 677
  • [6] An Evolutionary Multitasking Algorithm With Multiple Filtering for High-Dimensional Feature Selection
    Li, Lingjie
    Xuan, Manlin
    Lin, Qiuzhen
    Jiang, Min
    Ming, Zhong
    Tan, Kay Chen
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (04) : 802 - 816
  • [7] A PSO Based Hybrid Feature Selection Algorithm for High-Dimensional Classification
    Binh Tran
    Zhang, Mengjie
    Xue, Bing
    [J]. 2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 3801 - 3808
  • [8] Improving Evolutionary Algorithm Performance for Feature Selection in High-Dimensional Data
    Cilia, N.
    De Stefano, C.
    Fontanella, F.
    di Freca, A. Scotto
    [J]. APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2018, 2018, 10784 : 439 - 454
  • [9] Feature Selection in High Dimensional Data by a Filter-Based Genetic Algorithm
    De Stefano, Claudio
    Fontanella, Francesco
    di Freca, Alessandra Scotto
    [J]. APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 506 - 521
  • [10] Feature Selection in High Dimensional Data: A Review
    Silaich, Sarita
    Gupta, Suneet
    [J]. THIRD CONGRESS ON INTELLIGENT SYSTEMS, CIS 2022, VOL 1, 2023, 608 : 703 - 717