Ensemble Linear Subspace Analysis of High-Dimensional Data

被引:3
|
作者
Ahmed, S. Ejaz [1 ]
Amiri, Saeid [2 ]
Doksum, Kjell [3 ]
机构
[1] Brock Univ, Dept Math & Stat, St Catharines, ON L2S 3A1, Canada
[2] Polytech Montreal, Dept Civil Geol & Min Engn, Montreal, PQ H3T 1J4, Canada
[3] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
基金
加拿大自然科学与工程研究理事会;
关键词
ensembling; high-dimensional data; Lasso; elastic net; penalty methods; prediction; random subspaces; VARIABLE SELECTION; MODEL SELECTION; RECOVERY; LASSO;
D O I
10.3390/e23030324
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Learning distance to subspace for the nearest subspace methods in high-dimensional data classification
    Zhu, Rui
    Dong, Mingzhi
    Xue, Jing-Hao
    INFORMATION SCIENCES, 2019, 481 : 69 - 80
  • [22] Optimal Linear Discriminant Analysis for High-Dimensional Functional Data
    Xue, Kaijie
    Yang, Jin
    Yao, Fang
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1055 - 1064
  • [23] Scaling log-linear analysis to high-dimensional data
    Petitjean, Francois
    Webb, Geoffrey I.
    Nicholson, Ann E.
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 597 - 606
  • [24] Generalized Linear Discriminant Analysis for High-Dimensional Genomic Data
    Li, Sisi
    Lewinger, Juan Pablo
    GENETIC EPIDEMIOLOGY, 2017, 41 (07) : 704 - 704
  • [25] Generalized linear discriminant analysis for high-dimensional genomic data
    Li, Sisi
    Lewinger, Juan Pablo
    GENETIC EPIDEMIOLOGY, 2018, 42 (07) : 713 - 713
  • [26] High-Dimensional Spatial Simulation Ensemble Analysis
    Dahshan, Mai
    House, Leanna
    Polys, Nicholas
    PROCEEDINGS OF THE 9TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ANALYTICS FOR BIG GEOSPATIAL DATA, BIGSPATIAL 2020, 2020,
  • [27] Ensemble Clustering for Boundary Detection in High-Dimensional Data
    Anagnostou, Panagiotis
    Pavlidis, Nicos G.
    Tasoulis, Sotiris
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, LOD 2023, PT II, 2024, 14506 : 324 - 333
  • [28] A generic framework for efficient subspace clustering of high-dimensional data
    Kriegel, HP
    Kröger, P
    Renz, M
    Wurst, S
    Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 250 - 257
  • [29] Ensemble of sparse classifiers for high-dimensional biological data
    Kim, Sunghan
    Scalzo, Fabien
    Telesca, Donatello
    Hu, Xiao
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (02) : 167 - 183
  • [30] ICE: Incremental Subspace Clustering of High-Dimensional Categorical Data
    Pang, Ning
    Zhang, Chaowei
    Zhang, Jifu
    Qin, Xiao
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2025, 33 (01) : 87 - 118