Ensemble Linear Subspace Analysis of High-Dimensional Data

被引:3
|
作者
Ahmed, S. Ejaz [1 ]
Amiri, Saeid [2 ]
Doksum, Kjell [3 ]
机构
[1] Brock Univ, Dept Math & Stat, St Catharines, ON L2S 3A1, Canada
[2] Polytech Montreal, Dept Civil Geol & Min Engn, Montreal, PQ H3T 1J4, Canada
[3] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
基金
加拿大自然科学与工程研究理事会;
关键词
ensembling; high-dimensional data; Lasso; elastic net; penalty methods; prediction; random subspaces; VARIABLE SELECTION; MODEL SELECTION; RECOVERY; LASSO;
D O I
10.3390/e23030324
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Regression models provide prediction frameworks for multivariate mutual information analysis that uses information concepts when choosing covariates (also called features) that are important for analysis and prediction. We consider a high dimensional regression framework where the number of covariates (p) exceed the sample size (n). Recent work in high dimensional regression analysis has embraced an ensemble subspace approach that consists of selecting random subsets of covariates with fewer than p covariates, doing statistical analysis on each subset, and then merging the results from the subsets. We examine conditions under which penalty methods such as Lasso perform better when used in the ensemble approach by computing mean squared prediction errors for simulations and a real data example. Linear models with both random and fixed designs are considered. We examine two versions of penalty methods: one where the tuning parameter is selected by cross-validation; and one where the final predictor is a trimmed average of individual predictors corresponding to the members of a set of fixed tuning parameters. We find that the ensemble approach improves on penalty methods for several important real data and model scenarios. The improvement occurs when covariates are strongly associated with the response, when the complexity of the model is high. In such cases, the trimmed average version of ensemble Lasso is often the best predictor.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Ensemble of Trees for Classifying High-Dimensional Imbalanced Genomic Data
    Farid, Dewan Md.
    Nowe, Ann
    Manderick, Bernard
    PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 1, 2018, 15 : 172 - 187
  • [42] Subspace Clustering for High-Dimensional Data Using Cluster Structure Similarity
    Fatehi, Kavan
    Rezvani, Mohsen
    Fateh, Mansoor
    Pajoohan, Mohammad-Reza
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2018, 14 (03) : 38 - 55
  • [43] Dimension Reconstruction for Visual Exploration of Subspace Clusters in High-dimensional Data
    Zhou, Fangfang
    Li, Juncai
    Huang, Wei
    Zhao, Ying
    Yuan, Xiaoru
    Liang, Xing
    Shi, Yang
    2016 IEEE PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS), 2016, : 128 - 135
  • [44] An entropy weighting mixture model for subspace clustering of high-dimensional data
    Peng, Liuqing
    Zhang, Junying
    PATTERN RECOGNITION LETTERS, 2011, 32 (08) : 1154 - 1161
  • [45] Dynamic Sparse Subspace Clustering for Evolving High-Dimensional Data Streams
    Sui, Jinping
    Liu, Zhen
    Liu, Li
    Jung, Alexander
    Li, Xiang
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4173 - 4186
  • [46] A Compressed PCA Subspace Method for Anomaly Detection in High-Dimensional Data
    Ding, Qi
    Kolaczyk, Eric D.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (11) : 7419 - 7433
  • [47] Accelerating Density-Based Subspace Clustering in High-Dimensional Data
    Prinzbach, Juergen
    Lauer, Tobias
    Kiefer, Nicolas
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 474 - 481
  • [48] A novel ensemble method for high-dimensional genomic data classification
    Espichan, Alexandra
    Villanueva, Edwin
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2229 - 2236
  • [49] Visualisation of High-Dimensional Data Using an Ensemble of Neural Networks
    Gianniotis, Nikolaos
    Riggelsen, Carsten
    PROCEEDINGS OF THE 2013 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND ENSEMBLE LEARNING (CIEL), 2013, : 17 - 24
  • [50] Ensemble of penalized logistic models for classification of high-dimensional data
    Ijaz, Musarrat
    Asghar, Zahid
    Gul, Asma
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (07) : 2072 - 2088