Fast Cross-validation for Multi-penalty High-dimensional Ridge Regression

被引:16
|
作者
van de Wiel, Mark A. [1 ]
van Nee, Mirrelijn M. [1 ]
Rauschenberger, Armin [2 ]
机构
[1] Univ Amsterdam, Med Ctr, Dept Epidemiol & Data Sci, Amsterdam, Netherlands
[2] Univ Luxembourg, Luxembourg Ctr Syst Biomed LCSB, Esch Sur Alzette, Luxembourg
关键词
Cancer genomics; High-dimensional prediction; Iterative weighted least squares; Marginal likelihood; Multi-view learning; PREDICTION; ALGORITHM; LASSO;
D O I
10.1080/10618600.2021.1904962
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusion of data type-specific penalties. The largest challenge for multi-penalty ridge is to optimize these penalties efficiently in a cross-validation (CV) setting, in particular for GLM and Cox ridge regression, which require an additional estimation loop by iterative weighted least squares (IWLS). Our main contribution is a computationally very efficient formula for the multi-penalty, sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly all computations are in low-dimensional space, rendering a speed-up of several orders of magnitude. We developed a flexible framework that facilitates multiple types of response, unpenalized covariates, several performance criteria and repeated CV. Extensions to paired and preferential data types are included and illustrated on several cancer genomics survival prediction problems. Moreover, we present similar computational shortcuts for maximum marginal likelihood and Bayesian probit regression. The corresponding R-package, multiridge, serves as a versatile standalone tool, but also as a fast benchmark for other more complex models and multi-view learners. Supplementary materials for this article are available online.
引用
收藏
页码:835 / 847
页数:13
相关论文
共 50 条
  • [21] KERNEL RIDGE REGRESSION WITH AUTOCORRELATION PRIOR: OPTIMAL MODEL AND CROSS-VALIDATION
    Tanaka, Akira
    Imai, Hideyuki
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3872 - 3876
  • [22] Generalized Cross-Validation for Simultaneous Optimization of Tuning Parameters in Ridge Regression
    Roozbeh, M.
    Arashi, M.
    Hamzah, N. A.
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION A-SCIENCE, 2020, 44 (02): : 473 - 485
  • [23] Fast Cross-Validation
    Liu, Yong
    Lin, Hailun
    Ding, Lizhong
    Wang, Weiping
    Liao, Shizhong
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 2497 - 2503
  • [24] HIGH-DIMENSIONAL ASYMPTOTICS OF PREDICTION: RIDGE REGRESSION AND CLASSIFICATION
    Dobriban, Edgar
    Wager, Stefan
    ANNALS OF STATISTICS, 2018, 46 (01): : 247 - 279
  • [25] Robust Leave-One-Out Cross-Validation for High-Dimensional Bayesian Models
    Silva, Luca Alessandro
    Zanella, Giacomo
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (547) : 2369 - 2381
  • [26] CROSS-VALIDATION IN STEPWISE REGRESSION
    SALAHUDDIN
    HAWKES, AG
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1991, 20 (04) : 1163 - 1182
  • [27] Can we globally optimize cross-validation loss? Quasiconvexity in ridge regression
    Stephenson, William T.
    Frangella, Zachary
    Udell, Madeleine
    Broderick, Tamara
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [28] nestedcv: an R package for fast implementation of nested cross-validation with embedded feature selection designed for transcriptomics and high-dimensional data
    Lewis, Myles J.
    Spiliopoulou, Athina
    Goldmann, Katriona
    Pitzalis, Costantino
    McKeigue, Paul
    Barnes, Michael R.
    BIOINFORMATICS ADVANCES, 2023, 3 (01):
  • [29] Nested cross-validation with ensemble feature selection and classification model for high-dimensional biological data
    Zhong, Yi
    Chalise, Prabhakar
    He, Jianghua
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (01) : 110 - 125
  • [30] Pricing high-dimensional American options by kernel ridge regression
    Hu, Wenbin
    Zastawniak, Tomasz
    QUANTITATIVE FINANCE, 2020, 20 (05) : 851 - 865