High-dimensional regression with potential prior information on variable importance

被引：0

作者：

Stokell, Benjamin G. ^{[1
]}

Shah, Rajen D. ^{[1
]}

机构：

[1] Univ Cambridge, Cambridge, England

来源：

STATISTICS AND COMPUTING | 2022年 / 32卷 / 03期

基金：

英国工程与自然科学研究理事会;

关键词：

High-dimensional data; Low variance filter; Lasso; Ridge regression; Missing data; Corrupted data; SELECTION; LASSO;

D O I：

10.1007/s11222-022-10110-5

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

There are a variety of settings where vague prior information may be available on the importance of predictors in high-dimensional regression settings. Examples include the ordering on the variables offered by their empirical variances (which is typically discarded through standardisation), the lag of predictors when fitting autoregressive models in time series settings, or the level of missingness of the variables. Whilst such orderings may not match the true importance of variables, we argue that there is little to be lost, and potentially much to be gained, by using them. We propose a simple scheme involving fitting a sequence of models indicated by the ordering. We show that the computational cost for fitting all models when ridge regression is used is no more than for a single fit of ridge regression, and describe a strategy for Lasso regression that makes use of previous fits to greatly speed up fitting the entire sequence of models. We propose to select a final estimator by cross-validation and provide a general result on the quality of the best performing estimator on a test set selected from among a number M of competing estimators in a high-dimensional linear regression setting. Our result requires no sparsity assumptions and shows that only a log M price is incurred compared to the unknown best estimator. We demonstrate the effectiveness of our approach when applied to missing or corrupted data, and in time series settings. An R package is available on github.

引用

页数：12

共 50 条

[1] High-dimensional regression with potential prior information on variable importance
Benjamin G. Stokell
Rajen D. Shah
Statistics and Computing, 2022, 32
[2] A Simple Information Criterion for Variable Selection in High-Dimensional Regression
Pluntz, Matthieu
Dalmasso, Cyril
Tubert-Bitter, Pascale
Ahmed, Ismail
STATISTICS IN MEDICINE, 2025, 44 (1-2)
[3] A stepwise regression algorithm for high-dimensional variable selection
Hwang, Jing-Shiang
Hu, Tsuey-Hwa
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2015, 85 (09) : 1793 - 1806
[4] FAITHFUL VARIABLE SCREENING FOR HIGH-DIMENSIONAL CONVEX REGRESSION
Xu, Min
Chen, Minhua
Lafferty, John
ANNALS OF STATISTICS, 2016, 44 (06): : 2624 - 2660
[5] LATENT VARIABLE SYMBOLIC REGRESSION FOR HIGH-DIMENSIONAL INPUTS
McConaghy, Trent
GENETIC PROGRAMMING THEORY AND PRACTICE VII, 2010, : 103 - 118
[6] Variable Selection Diagnostics Measures for High-Dimensional Regression
Nan, Ying
Yang, Yuhong
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2014, 23 (03) : 636 - 656
[7] Variable Importance in High-Dimensional Settings Requires Grouping
Chamma, Ahmad
Thirion, Bertrand
Engemann, Denis
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 11195 - 11203
[8] Variance Prior Forms for High-Dimensional Bayesian Variable Selection
Moran, Gemma E.
Rockova, Veronika
George, Edward I.
BAYESIAN ANALYSIS, 2019, 14 (04): : 1091 - 1119
[9] SIGHR: Side information guided high-dimensional regression
Yang, Yuan
Mcmahan, Christopher S.
Wang, Yu-Bo
Baurley, James W.
Park, Sung-Shim
STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (11) : 2270 - 2282
[10] SIGHR: Side Information Guided High-dimensional Regression
Yang, Yuan
Mcmahan, Christopher S.
Wang, Yu-Bo
Baurley, James W.
Park, Sung-Shim
GENETIC EPIDEMIOLOGY, 2021, 45 (07) : 801 - 802

← 1 2 3 4 5 →