共 32 条
The Optimal Ridge Penalty for Real-world High-dimensional Data Can Be Zero or Negative due to the Implicit Ridge Regularization
被引:0
|作者:
Kobak, Dmitry
[1
]
Lomond, Jonathan
[1
]
Sanchez, Benoit
[1
]
机构:
[1] Univ Tubingen, Inst Ophthalm Res, Otfried Muller Str 25, D-72076 Tubingen, Germany
基金:
美国国家卫生研究院;
关键词:
High-dimensional;
ridge regression;
regularization;
REGRESSION;
SELECTION;
D O I:
暂无
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
A conventional wisdom in statistical learning is that large models require strong regularization to prevent overfitting. Here we show that this rule can be violated by linear regression in the underdetermined n << p situation under realistic conditions. Using simulations and real-life high-dimensional datasets, we demonstrate that an explicit positive ridge penalty can fail to provide any improvement over the minimum-norm least squares estimator. Moreover, the optimal value of ridge penalty in this situation can be negative. This happens when the high-variance directions in the predictor space can predict the response variable, which is often the case in the real-world high-dimensional data. In this regime, low-variance directions provide an implicit ridge regularization and can make any further positive ridge penalty detrimental. We prove that augmenting any linear model with random covariates and using minimum-norm estimator is asymptotically equivalent to adding the ridge penalty. We use a spiked covariance model as an analytically tractable example and prove that the optimal ridge penalty in this case is negative when n << p.
引用
收藏
页数:16
相关论文