HDSI: High dimensional selection with interactions algorithm on feature selection and testing

被引:12
|
作者
Jain, Rahi [1 ]
Xu, Wei [1 ,2 ]
机构
[1] Princess Margaret Canc Res Ctr, Biostat Dept, Toronto, ON, Canada
[2] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
来源
PLOS ONE | 2021年 / 16卷 / 02期
基金
加拿大自然科学与工程研究理事会;
关键词
RANDOM SUBSPACE METHOD; VARIABLE SELECTION; REGRESSION; LASSO; REGULARIZATION; MODELS; SHRINKAGE;
D O I
10.1371/journal.pone.0246159
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Feature selection on high dimensional data along with the interaction effects is a critical challenge for classical statistical learning techniques. Existing feature selection algorithms such as random LASSO leverages LASSO capability to handle high dimensional data. However, the technique has two main limitations, namely the inability to consider interaction terms and the lack of a statistical test for determining the significance of selected features. This study proposes a High Dimensional Selection with Interactions (HDSI) algorithm, a new feature selection method, which can handle high-dimensional data, incorporate interaction terms, provide the statistical inferences of selected features and leverage the capability of existing classical statistical techniques. The method allows the application of any statistical technique like LASSO and subset selection on multiple bootstrapped samples; each contains randomly selected features. Each bootstrap data incorporates interaction terms for the randomly sampled features. The selected features from each model are pooled and their statistical significance is determined. The selected statistically significant features are used as the final output of the approach, whose final coefficients are estimated using appropriate statistical techniques. The performance of HDSI is evaluated using both simulated data and real studies. In general, HDSI outperforms the commonly used algorithms such as LASSO, subset selection, adaptive LASSO, random LASSO and group LASSO.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] Whale Optimization Algorithm for High-dimensional Small-Instance Feature Selection
    Mafarja, Majdi
    Jaber, Iyad
    Ahmed, Sobhi
    [J]. 2018 FIFTH INTERNATIONAL SYMPOSIUM ON INNOVATION IN INFORMATION AND COMMUNICATION TECHNOLOGY (ISIICT 2018), 2018, : 104 - +
  • [32] A differential evolution based feature combination selection algorithm for high-dimensional data
    Guan, Boxin
    Zhao, Yuhai
    Yin, Ying
    Li, Yuan
    [J]. INFORMATION SCIENCES, 2021, 547 : 870 - 886
  • [33] A group evaluation based binary PSO algorithm for feature selection in high dimensional data
    Ramesh Kumar Huda
    Haider Banka
    [J]. Evolutionary Intelligence, 2021, 14 : 1949 - 1963
  • [34] A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm
    Moradkhani, Mostafa
    Amiri, Ali
    Javaherian, Mohsen
    Safari, Hossein
    [J]. APPLIED SOFT COMPUTING, 2015, 35 : 123 - 135
  • [35] Feature selection algorithm based on optimized genetic algorithm and the application in high-dimensional data processing
    Feng, Guilian
    [J]. PLOS ONE, 2024, 19 (05):
  • [36] Dimensional decision covariance colony predation algorithm: global optimization and high-dimensional feature selection
    Xu, Boyang
    Heidari, Ali Asghar
    Cai, Zhennao
    Chen, Huiling
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (10) : 11415 - 11471
  • [37] A High Performance Algorithm for Text Feature Automatic Selection
    Dai, Jin
    He, Zhongshi
    Hu, Feng
    [J]. ISIP: 2009 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING, PROCEEDINGS, 2009, : 414 - +
  • [38] FEATURE SELECTION FOR HIGH-DIMENSIONAL DATA ANALYSIS
    Verleysen, Michel
    [J]. NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : IS23 - IS25
  • [39] Survey on Feature Subset Selection for High Dimensional Data
    Shahana, A. H.
    Preeja, V
    [J]. PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [40] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514