AN ANALYSIS OF THE COST OF HYPERPARAMETER SELECTION VIA SPLIT-SAMPLE VALIDATION, WITH APPLICATIONS TO PENALIZED REGRESSION

被引:2
|
作者
Feng, Jean [1 ]
Simon, Noah [1 ]
机构
[1] Univ Washington, Dept Biostat, Box 357232,Hlth Sci Bldg F-650,Box 357232, Seattle, WA 98195 USA
关键词
Cross-validation; regression; regularization; GENERALIZED CROSS-VALIDATION; VARIABLE SELECTION; REGULARIZATION;
D O I
10.5705/ss.202017.0310
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In a regression setting, a model estimation procedure constructs a model from training data for given a set of hyperparameters. The optimal hyperparameters that minimize the generalization error of the model are usually unknown. Thus, in practice, they are often estimated using split-sample validation. However, how the generalization error of the selected model grows with the number of hyperparameters to be estimated remains an open question. To address this, we establish finite-sample oracle inequalities for selection based on a single training/test split and cross-validation. We show that if the model estimation procedures are smoothly parameterized by the hyperparameters, the error incurred from tuning the hyperparameters shrinks at a near-parametric rate. Hence for semiparametric and nonparametric model estimation procedures with a fixed number of hyperparameters, this additional error is negligible. For parametric model estimation procedures, adding a hyperparameter is roughly equivalent to adding a parameter to the model itself. In addition, we specialize these ideas for penalized regression problems with multiple penalty parameters. We establish that the fitted models are Lipschitz in the penalty parameters and, thus, our oracle inequalities apply. This result encourages the development of regularization methods with many penalty parameters.
引用
收藏
页码:511 / 530
页数:20
相关论文
共 50 条
  • [1] The hazards of split-sample validation in hydrological model calibration
    Arsenault, Richard
    Brissette, Francois
    Martel, Jean-Luc
    [J]. JOURNAL OF HYDROLOGY, 2018, 566 : 346 - 362
  • [2] Analysis of bias in HMA field split-sample testing
    Schmitt, RL
    Hanna, AS
    Russel, JS
    Nordheim, EV
    [J]. JOURNAL OF THE ASSOCIATION OF ASPHALT PAVING TECHNOLOGISTS, VOL 70: ASPHALT PAVING TECHNOLOGY 2001, 2001, : 273 - 300
  • [3] Model Selection Via Penalized Logistic Regression
    Ayers, Kristin L.
    Cordell, Heather J.
    [J]. GENETIC EPIDEMIOLOGY, 2009, 33 (08) : 770 - 770
  • [4] SPLIT-SAMPLE DESIGNWITH PARALLEL PROTOCOLS TO REDUCE COST AND NONRESPONSE BIAS IN SURVEYS
    Peytchev, Andy
    [J]. JOURNAL OF SURVEY STATISTICS AND METHODOLOGY, 2020, 8 (04) : 748 - 771
  • [5] Split-sample analysis of cervical scrapings for routine screening of cervical cancer
    Othman, N.
    Othman, N. H.
    [J]. HISTOPATHOLOGY, 2012, 61 : 109 - 110
  • [6] A NEW PROJECTION-TYPE SPLIT-SAMPLE SCORE TEST IN LINEAR INSTRUMENTAL VARIABLES REGRESSION
    Chaudhuri, Saraswata
    Richardson, Thomas
    Robins, James
    Zivot, Eric
    [J]. ECONOMETRIC THEORY, 2010, 26 (06) : 1820 - 1837
  • [7] Pairing regression and configurational analysis in health services research: modelling outcomes in an observational cohort using a split-sample design
    Miech, Edward J.
    Perkins, Anthony J.
    Zhang, Ying
    Myers, Laura J.
    Sico, Jason J.
    Daggy, Joanne
    Bravata, Dawn M.
    [J]. BMJ OPEN, 2022, 12 (06):
  • [8] Development and split-sample validation of a nomogram predicting the probability of seminal vesicle invasion at radical prostatectomy
    Gallina, Andrea
    Chuna, Felix K. -H.
    Briganti, Alberto
    Shariat, Shahrokh F.
    Montorsi, Francesco
    Salonia, Andrea
    Erbersdobler, Andreas
    Rigatti, Patrizio
    Valiquette, Luc
    Huland, Hartwig
    Graefen, Markus
    Karakiewicz, Pierre I.
    [J]. EUROPEAN UROLOGY, 2007, 52 (01) : 98 - 105
  • [9] Prediction intervals for rainfall-runoff models: raw error method and split-sample validation
    Ewen, John
    O'Donnell, Greg
    [J]. HYDROLOGY RESEARCH, 2012, 43 (05): : 637 - 648
  • [10] On Estimation and Selection of Autologistic Regression Models via Penalized Pseudolikelihood
    Rao Fu
    Andrew L. Thurman
    Tingjin Chu
    Michelle M. Steen-Adams
    Jun Zhu
    [J]. Journal of Agricultural, Biological, and Environmental Statistics, 2013, 18 : 429 - 449