Influence Diagnostics for High-Dimensional Lasso Regression

被引：8

作者：

Rajaratnam, Bala ^{[1
]}

Roberts, Steven ^{[2
]}

Sparks, Doug ^{[1
]}

Yu, Honglin ^{[2
]}

机构：

[1] Univ Calif Davis, Dept Stat, Davis, CA 95616 USA

[2] Australian Natl Univ, Coll Business & Econ, Res Sch Finance Actuarial Studies & Stat, Canberra, ACT, Australia

来源：

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS | 2019年 / 28卷 / 04期

基金：

美国国家科学基金会;

关键词：

Large p small n; Model selection; Regression diagnostics; Shrinkage; BAYESIAN INFORMATION CRITERIA; MODEL SELECTION; LEAST ANGLE; SHRINKAGE;

D O I：

10.1080/10618600.2019.1598869

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

The increased availability of high-dimensional data, and appeal of a "sparse" solution has made penalized likelihood methods commonplace. Arguably the most widely utilized of these methods is regularization, popularly known as the lasso. When the lasso is applied to high-dimensional data, observations are relatively few; thus, each observation can potentially have tremendous influence on model selection and inference. Hence, a natural question in this context is the identification and assessment of influential observations. We address this by extending the framework for assessing estimation influence in traditional linear regression, and demonstrate that it is equally, if not more, relevant for assessing model selection influence for high-dimensional lasso regression. Within this framework, we propose four new "deletion methods" for gauging the influence of an observation on lasso model selection: df-model, df-regpath, df-cvpath, and df-lambda. Asymptotic cut-offs for each measure, even when , are developed. We illustrate that in high-dimensional settings, individual observations can have a tremendous impact on lasso model selection. We demonstrate that application of our measures can help reveal relationships in high-dimensional real data that may otherwise remain hidden. for this article are available online.

引用

页码：877 / 890

页数：14

共 50 条

[31] Lasso inference for high-dimensional time series
Adamek, Robert
Smeekes, Stephan
Wilms, Ines
[J]. JOURNAL OF ECONOMETRICS, 2023, 235 (02) : 1114 - 1143
[32] High-dimensional posterior consistency of the Bayesian lasso
Dasgupta, Shibasish
[J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2016, 45 (22) : 6700 - 6708
[33] High-dimensional graphs and variable selection with the Lasso
Meinshausen, Nicolai
Buehlmann, Peter
[J]. ANNALS OF STATISTICS, 2006, 34 (03): : 1436 - 1462
[34] Lasso penalized semiparametric regression on high-dimensional recurrent event data via coordinate descent
Wu, Tong Tong
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2013, 83 (06) : 1145 - 1155
[35] Sparse and debiased lasso estimation and inference for high-dimensional composite quantile regression with distributed data
Zhaohan Hou
Wei Ma
Lei Wang
[J]. TEST, 2023, 32 : 1230 - 1250
[36] Fully Bayesian logistic regression with hyper-LASSO priors for high-dimensional feature selection
Li, Longhai
Yao, Weixin
[J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (14) : 2827 - 2851
[37] Sparse and debiased lasso estimation and inference for high-dimensional composite quantile regression with distributed data
Hou, Zhaohan
Ma, Wei
Wang, Lei
[J]. TEST, 2023, 32 (04) : 1230 - 1250
[38] Regression on High-dimensional Inputs
Kuleshov, Alexander
Bernstein, Alexander
[J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 732 - 739
[39] On inference in high-dimensional regression
Battey, Heather S.
Reid, Nancy
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2023, 85 (01) : 149 - 175
[40] Weighted Lasso subsampling for high dimensional regression
Uraibi, Hassan S.
[J]. ELECTRONIC JOURNAL OF APPLIED STATISTICAL ANALYSIS, 2019, 12 (01) : 69 - 84

← 1 2 3 4 5 →