An asymptotic and empirical smoothing parameters selection method for smoothing spline ANOVA models in large samples

被引:8
|
作者
Sun, Xiaoxiao [1 ]
Zhong, Wenxuan [2 ]
Ma, Ping [2 ]
机构
[1] Univ Arizona, Dept Epidemiol & Biostat, 1295 North Martin Ave, Tucson, AZ 85724 USA
[2] Univ Georgia, Dept Stat, 310 Herty Dr, Athens, GA 30602 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Asymptotic analysis; Generalized cross-validation; Smoothing parameters selection; Smoothing spline ANOVA model; Subsample; PENALIZED LIKELIHOOD; REGRESSION; COMPUTATION; SCALE;
D O I
10.1093/biomet/asaa047
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large samples are generated routinely from various sources. Classic statistical models, such as smoothing spline ANOVA models, are not well equipped to analyse such large samples because of high computational costs. In particular, the daunting computational cost of selecting smoothing parameters renders smoothing spline ANOVA models impractical. In this article, we develop an asympirical, i.e., asymptotic and empirical, smoothing parameters selection method for smoothing spline ANOVA models in large samples. The idea of our approach is to use asymptotic analysis to show that the optimal smoothing parameter is a polynomial function of the sample size and an unknown constant. The unknown constant is then estimated through empirical subsample extrapolation. The proposed method significantly reduces the computational burden of selecting smoothing parameters in high-dimensional and large samples. We show that smoothing parameters chosen by the proposed method tend to the optimal smoothing parameters that minimize a specific risk function. In addition, the estimator based on the proposed smoothing parameters achieves the optimal convergence rate. Extensive simulation studies demonstrate the numerical advantage of the proposed method over competing methods in terms of relative efficacy and running time. In an application to molecular dynamics data containing nearly one million observations, the proposed method has the best prediction performance.
引用
收藏
页码:149 / 166
页数:18
相关论文
共 50 条
  • [1] Designs for smoothing spline ANOVA models
    Yue, RX
    Hickernell, FJ
    [J]. METRIKA, 2002, 55 (03) : 161 - 176
  • [2] Designs for smoothing spline ANOVA models
    Rong-Xian Yue
    Fred J. Hickernell
    [J]. Metrika, 2002, 55 : 161 - 176
  • [3] Fast and Stable Multiple Smoothing Parameter Selection in Smoothing Spline Analysis of Variance Models With Large Samples
    Helwig, Nathaniel E.
    Ma, Ping
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2015, 24 (03) : 715 - 732
  • [4] Smoothing spline ANOVA for super-large samples: scalable computation via rounding parameters
    Helwig, Nathaniel E.
    Ma, Ping
    [J]. STATISTICS AND ITS INTERFACE, 2016, 9 (04) : 433 - 444
  • [5] Model diagnostics for smoothing spline ANOVA models
    Gu, C
    [J]. CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2004, 32 (04): : 347 - 358
  • [6] Backfitting in smoothing spline ANOVA
    Luo, Z
    [J]. ANNALS OF STATISTICS, 1998, 26 (05): : 1733 - 1759
  • [7] State space representation for smoothing spline ANOVA models
    Qin, Li
    Guo, Wensheng
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2006, 15 (04) : 830 - 847
  • [8] Smoothing Spline ANOVA Models: R Package gss
    Gu, Chong
    [J]. JOURNAL OF STATISTICAL SOFTWARE, 2014, 58 (05): : 1 - 25
  • [9] Variable Selection in Bayesian Smoothing Spline ANOVA Models: Application to Deterministic Computer Codes
    Reich, Brian J.
    Storlie, Curtis B.
    Bondell, Howard D.
    [J]. TECHNOMETRICS, 2009, 51 (02) : 110 - 120
  • [10] Bayes Factors for Smoothing Spline ANOVA
    Cheng, Chin-I.
    Speckman, Paul L.
    [J]. BAYESIAN ANALYSIS, 2016, 11 (04): : 957 - 975