HDSI: High dimensional selection with interactions algorithm on feature selection and testing

被引:12
|
作者
Jain, Rahi [1 ]
Xu, Wei [1 ,2 ]
机构
[1] Princess Margaret Canc Res Ctr, Biostat Dept, Toronto, ON, Canada
[2] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
来源
PLOS ONE | 2021年 / 16卷 / 02期
基金
加拿大自然科学与工程研究理事会;
关键词
RANDOM SUBSPACE METHOD; VARIABLE SELECTION; REGRESSION; LASSO; REGULARIZATION; MODELS; SHRINKAGE;
D O I
10.1371/journal.pone.0246159
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Feature selection on high dimensional data along with the interaction effects is a critical challenge for classical statistical learning techniques. Existing feature selection algorithms such as random LASSO leverages LASSO capability to handle high dimensional data. However, the technique has two main limitations, namely the inability to consider interaction terms and the lack of a statistical test for determining the significance of selected features. This study proposes a High Dimensional Selection with Interactions (HDSI) algorithm, a new feature selection method, which can handle high-dimensional data, incorporate interaction terms, provide the statistical inferences of selected features and leverage the capability of existing classical statistical techniques. The method allows the application of any statistical technique like LASSO and subset selection on multiple bootstrapped samples; each contains randomly selected features. Each bootstrap data incorporates interaction terms for the randomly sampled features. The selected features from each model are pooled and their statistical significance is determined. The selected statistically significant features are used as the final output of the approach, whose final coefficients are estimated using appropriate statistical techniques. The performance of HDSI is evaluated using both simulated data and real studies. In general, HDSI outperforms the commonly used algorithms such as LASSO, subset selection, adaptive LASSO, random LASSO and group LASSO.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Feature selection for high-dimensional data in astronomy
    Zheng, Hongwen
    Zhang, Yanxia
    [J]. ADVANCES IN SPACE RESEARCH, 2008, 41 (12) : 1960 - 1964
  • [42] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    [J]. NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [43] High-dimensional feature selection for genomic datasets
    Afshar, Majid
    Usefi, Hamid
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [44] A High Performance Algorithm for Text Feature Automatic Selection
    Dai, Jin
    He, Zhongshi
    Hu, Feng
    [J]. ISIP: 2009 INTERNATIONAL SYMPOSIUM ON INFORMATION PROCESSING, PROCEEDINGS, 2009, : 414 - +
  • [45] Review on Feature Selection Methods in High Dimensional Domains
    Devika, U. K.
    Babu, Sheeba
    Kizhakkethottam, Jubilant J.
    [J]. PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORKS SECURITY (ICSNS 2015), 2015,
  • [46] Feature selection for high-dimensional temporal data
    Tsagris, Michail
    Lagani, Vincenzo
    Tsamardinos, Ioannis
    [J]. BMC BIOINFORMATICS, 2018, 19
  • [47] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [48] Feature Selection for Problem Decomposition on High Dimensional Optimization
    Reta, Pedro
    Landa, Ricardo
    [J]. 2014 IEEE SYMPOSIUM ON SWARM INTELLIGENCE (SIS), 2014, : 298 - 304
  • [49] Unsupervised Feature Selection in High Dimensional Spaces and Uncertainty
    Villar, Jose R.
    Suarez, Maria R.
    Sedano, Javier
    Mateos, Felipe
    [J]. HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS, 2009, 5572 : 565 - +
  • [50] New heuristics in feature selection for high dimensional data
    Ruiz, Roberto
    [J]. AI COMMUNICATIONS, 2007, 20 (02) : 129 - 131