Combination of Ensembles of Regularized Regression Models with Resampling-Based Lasso Feature Selection in High Dimensional Data

被引:18
|
作者
Patil, Abhijeet R. [1 ]
Kim, Sangjin [2 ]
机构
[1] Univ Texas El Paso, Computat Sci, El Paso, TX 79968 USA
[2] Univ Texas El Paso, Dept Math Sci, El Paso, TX 79968 USA
关键词
ensembles; feature selection; high-throughput; gene expression data; resampling; lasso; adaptive lasso; elastic net; SCAD; MCP; LOGISTIC-REGRESSION; VARIABLE SELECTION; CLASSIFICATION;
D O I
10.3390/math8010110
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In high-dimensional data, the performances of various classifiers are largely dependent on the selection of important features. Most of the individual classifiers with the existing feature selection (FS) methods do not perform well for highly correlated data. Obtaining important features using the FS method and selecting the best performing classifier is a challenging task in high throughput data. In this article, we propose a combination of resampling-based least absolute shrinkage and selection operator (LASSO) feature selection (RLFS) and ensembles of regularized regression (ERRM) capable of dealing data with the high correlation structures. The ERRM boosts the prediction accuracy with the top-ranked features obtained from RLFS. The RLFS utilizes the lasso penalty with sure independence screening (SIS) condition to select the top k ranked features. The ERRM includes five individual penalty based classifiers: LASSO, adaptive LASSO (ALASSO), elastic net (ENET), smoothly clipped absolute deviations (SCAD), and minimax concave penalty (MCP). It was built on the idea of bagging and rank aggregation. Upon performing simulation studies and applying to smokers' cancer gene expression data, we demonstrated that the proposed combination of ERRM with RLFS achieved superior performance of accuracy and geometric mean.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Resampling-based Variable Selection with Lasso for p >> n and Partially Linear Models
    Mares, Mihaela A.
    Guo, Yike
    [J]. 2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, : 1076 - 1082
  • [2] Resampling-Based Similarity Measures for High-Dimensional Data
    Amaratunga, Dhammika
    Cabrera, Javier
    Lee, Yung-Seop
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2015, 22 (01) : 54 - 62
  • [3] Subsampling versus Bootstrapping in Resampling-Based Model Selection for Multivariable Regression
    De Bin, Riccardo
    Janitza, Silke
    Sauerbrei, Willi
    Boulesteix, Anne-Laure
    [J]. BIOMETRICS, 2016, 72 (01) : 272 - 280
  • [4] High-Dimensional LASSO-Based Computational Regression Models: Regularization, Shrinkage, and Selection
    Emmert-Streib, Frank
    Dehmer, Matthias
    [J]. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2019, 1 (01): : 359 - 383
  • [5] Classification of high dimensional data using LASSO ensembles
    Urda, Daniel
    Franco, Leonardo
    Jerez, Jose M.
    [J]. 2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1548 - 1554
  • [6] Genomic selection using regularized linear regression models: ridge regression, lasso, elastic net and their extensions
    Joseph O Ogutu
    Torben Schulz-Streeck
    Hans-Peter Piepho
    [J]. BMC Proceedings, 6 (Suppl 2)
  • [7] Genomic Selection in Chinese Holsteins Using Regularized Regression Models for Feature Selection of Whole Genome Sequencing Data
    Li, Shanshan
    Yu, Jian
    Kang, Huimin
    Liu, Jianfeng
    [J]. ANIMALS, 2022, 12 (18):
  • [8] A differential evolution based feature combination selection algorithm for high-dimensional data
    Guan, Boxin
    Zhao, Yuhai
    Yin, Ying
    Li, Yuan
    [J]. INFORMATION SCIENCES, 2021, 547 : 870 - 886
  • [9] Analysis of high dimensional data using feature selection models
    Mahajan, Shubham
    Pandit, Amit Kant
    [J]. INTERNATIONAL JOURNAL OF NANOTECHNOLOGY, 2023, 20 (1-4) : 116 - 128
  • [10] Fully Bayesian logistic regression with hyper-LASSO priors for high-dimensional feature selection
    Li, Longhai
    Yao, Weixin
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (14) : 2827 - 2851