Ensemble Linear Subspace Analysis of High-Dimensional Data

被引:3
|
作者
Ahmed, S. Ejaz [1 ]
Amiri, Saeid [2 ]
Doksum, Kjell [3 ]
机构
[1] Brock Univ, Dept Math & Stat, St Catharines, ON L2S 3A1, Canada
[2] Polytech Montreal, Dept Civil Geol & Min Engn, Montreal, PQ H3T 1J4, Canada
[3] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
基金
加拿大自然科学与工程研究理事会;
关键词
ensembling; high-dimensional data; Lasso; elastic net; penalty methods; prediction; random subspaces; VARIABLE SELECTION; MODEL SELECTION; RECOVERY; LASSO;
D O I
10.3390/e23030324
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Density-connected subspace clustering for high-dimensional data
    Kailing, K
    Kriegel, HP
    Kröger, P
    PROCEEDINGS OF THE FOURTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2004, : 246 - 256
  • [32] Multivariate functional subspace classification for high-dimensional longitudinal data
    Fukuda, Tatsuya
    Matsui, Hidetoshi
    Takada, Hiroya
    Misumi, Toshihiro
    Konishi, Sadanori
    JAPANESE JOURNAL OF STATISTICS AND DATA SCIENCE, 2024, 7 (01) : 1 - 16
  • [33] A modular eigen subspace scheme for high-dimensional data classification
    Chang, YL
    Han, CC
    Jou, FD
    Fan, KC
    Chen, KS
    Chang, JH
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2004, 20 (07): : 1131 - 1143
  • [34] High-Dimensional Matched Subspace Detection When Data are Missing
    Balzano, Laura
    Recht, Benjamin
    Nowak, Robert
    2010 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2010, : 1638 - 1642
  • [35] Subspace-Weighted Consensus Clustering for High-Dimensional Data
    Cai, Xiaosha
    Huang, Dong
    ADVANCED DATA MINING AND APPLICATIONS, 2020, 12447 : 3 - 16
  • [36] A subspace ensemble framework for classification with high dimensional missing data
    Gao, Hang
    Jian, Songlei
    Peng, Yuxing
    Liu, Xinwang
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2017, 28 (04) : 1309 - 1324
  • [37] A subspace ensemble framework for classification with high dimensional missing data
    Hang Gao
    Songlei Jian
    Yuxing Peng
    Xinwang Liu
    Multidimensional Systems and Signal Processing, 2017, 28 : 1309 - 1324
  • [38] On sparse linear discriminant analysis algorithm for high-dimensional data classification
    Ng, Michael K.
    Liao, Li-Zhi
    Zhang, Leihong
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2011, 18 (02) : 223 - 235
  • [39] A Survey on High-Dimensional Subspace Clustering
    Qu, Wentao
    Xiu, Xianchao
    Chen, Huangyue
    Kong, Lingchen
    MATHEMATICS, 2023, 11 (02)
  • [40] Subspace Estimation From Incomplete Observations: A High-Dimensional Analysis
    Wang, Chuang
    Eldar, Yonina C.
    Lu, Yue M.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2018, 12 (06) : 1240 - 1252