HDSI: High dimensional selection with interactions algorithm on feature selection and testing

被引:12
|
作者
Jain, Rahi [1 ]
Xu, Wei [1 ,2 ]
机构
[1] Princess Margaret Canc Res Ctr, Biostat Dept, Toronto, ON, Canada
[2] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
来源
PLOS ONE | 2021年 / 16卷 / 02期
基金
加拿大自然科学与工程研究理事会;
关键词
RANDOM SUBSPACE METHOD; VARIABLE SELECTION; REGRESSION; LASSO; REGULARIZATION; MODELS; SHRINKAGE;
D O I
10.1371/journal.pone.0246159
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Feature selection on high dimensional data along with the interaction effects is a critical challenge for classical statistical learning techniques. Existing feature selection algorithms such as random LASSO leverages LASSO capability to handle high dimensional data. However, the technique has two main limitations, namely the inability to consider interaction terms and the lack of a statistical test for determining the significance of selected features. This study proposes a High Dimensional Selection with Interactions (HDSI) algorithm, a new feature selection method, which can handle high-dimensional data, incorporate interaction terms, provide the statistical inferences of selected features and leverage the capability of existing classical statistical techniques. The method allows the application of any statistical technique like LASSO and subset selection on multiple bootstrapped samples; each contains randomly selected features. Each bootstrap data incorporates interaction terms for the randomly sampled features. The selected features from each model are pooled and their statistical significance is determined. The selected statistically significant features are used as the final output of the approach, whose final coefficients are estimated using appropriate statistical techniques. The performance of HDSI is evaluated using both simulated data and real studies. In general, HDSI outperforms the commonly used algorithms such as LASSO, subset selection, adaptive LASSO, random LASSO and group LASSO.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] A fast dual-module hybrid high-dimensional feature selection algorithm
    Yang, Geying
    He, Junjiang
    Lan, Xiaolong
    Li, Tao
    Fang, Wenbo
    [J]. INFORMATION SCIENCES, 2024, 681
  • [22] Feature selection based on the best-path algorithm in high dimensional graphical models
    Riso, Luigi
    Zoia, Maria G.
    Nava, Consuelo R.
    [J]. INFORMATION SCIENCES, 2023, 649
  • [23] A density-based clustering algorithm for high-dimensional data with feature selection
    Qi Xianting
    Wang Pan
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 114 - 118
  • [24] Whale Optimisation Algorithm for high-dimensional small-instance feature selection
    Mafarja, Majdi
    Jaber, Iyad
    Ahmed, Sobhi
    Thaher, Thaer
    [J]. INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2021, 36 (02) : 80 - 96
  • [25] Feature selection in high dimensional data: A specific preordonnances-based memetic algorithm
    Chamlal, Hasna
    Ouaderhman, Tayeb
    El Mourtji, Basma
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 266
  • [26] BOSO: A novel feature selection algorithm for linear regression with high-dimensional data
    Valcarcel, Luis J.
    San Jose-Eneriz, Edurne L.
    Cendoya, Xabier
    Rubio, Angel L.
    Agirre, Xabier
    Prosper, Felipe L.
    Planes, Francisco
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (05)
  • [27] A Adaptive Cooperative Coevolutionary Algorithm for Parallel Feature Selection in High-Dimensional Datasets
    Firouznia, Marjan
    Trunfio, Giuseppe A.
    [J]. 30TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2022), 2022, : 211 - 218
  • [28] FACO: A Novel Hybrid Feature Selection Algorithm for High-Dimensional Data Classification
    Popoola, Gideon
    Oyeniran, Kayode
    [J]. SOUTHEASTCON 2024, 2024, : 61 - 68
  • [29] SFE: A Simple, Fast, and Efficient Feature Selection Algorithm for High-Dimensional Data
    Ahadzadeh, Behrouz
    Abdar, Moloud
    Safara, Fatemeh
    Khosravi, Abbas
    Menhaj, Mohammad Bagher
    Suganthan, Ponnuthurai Nagaratnam
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (06) : 1896 - 1911
  • [30] A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets
    Sayed, Sabah
    Nassef, Mohammad
    Badr, Amr
    Farag, Ibrahim
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 121 : 233 - 243