Optimal Tuning for Divide-and-conquer Kernel Ridge Regression with Massive Data

被引:0
|
作者
Xu, Ganggang [1 ]
Shang, Zuofeng [2 ]
Cheng, Guang [3 ]
机构
[1] SUNY Binghamton, Dept Math Sci, Binghamton, NY 13902 USA
[2] IUPUI, Dept Math Sci, Indianapolis, IN USA
[3] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80 | 2018年 / 80卷
关键词
ASYMPTOTIC OPTIMALITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Divide-and-conquer is a powerful approach for large and massive data analysis. In the nonparameteric regression setting, although various theoretical frameworks have been established to achieve optimality in estimation or hypothesis testing, how to choose the tuning parameter in a practically effective way is still an open problem. In this paper, we propose a data-driven procedure based on divide-and-conquer for selecting the tuning parameters in kernel ridge regression by modifying the popular Generalized Cross-validation (GCV, Wahba, 1990). While the proposed criterion is computationally scalable for massive data sets, it is also shown under mild conditions to be asymptotically optimal in the sense that minimizing the proposed distributed-GCV (dGCV) criterion is equivalent to minimizing the true global conditional empirical loss of the averaged function estimator, extending the existing optimality results of GCV to the divide-and-conquer framework.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] OPTIMAL SPEEDING UP OF PARALLEL ALGORITHMS BASED UPON THE DIVIDE-AND-CONQUER STRATEGY
    TANG, CY
    LEE, RCT
    INFORMATION SCIENCES, 1984, 32 (03) : 173 - 186
  • [33] Divide-and-conquer recurrences associated with generalized heaps, optimal merge, and related structures
    Chen, WM
    Chen, GH
    THEORETICAL COMPUTER SCIENCE, 2003, 292 (03) : 667 - 677
  • [34] Model selection via Bayesian information criterion for divide-and-conquer penalized quantile regression
    Kang, Jongkyeong
    Han, Seokwon
    Bang, Sungwan
    KOREAN JOURNAL OF APPLIED STATISTICS, 2022, 35 (02) : 217 - 227
  • [35] A divide-and-conquer approach to neural natural language generation from structured data
    Dethlefs, Nina
    Schoene, Annika
    Cuayahuitl, Heriberto
    NEUROCOMPUTING, 2021, 433 : 300 - 309
  • [36] Dynamics and optimal control of multibody systems using fractional generalized divide-and-conquer algorithm
    Arman Dabiri
    Mohammad Poursina
    J. A. Tenreiro Machado
    Nonlinear Dynamics, 2020, 102 : 1611 - 1626
  • [37] Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression
    Liu, Jiading
    Shi, Lei
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 56
  • [38] Dynamics and optimal control of multibody systems using fractional generalized divide-and-conquer algorithm
    Dabiri, Arman
    Poursina, Mohammad
    Machado, J. A. Tenreiro
    NONLINEAR DYNAMICS, 2020, 102 (03) : 1611 - 1626
  • [39] A divide-and-conquer method for scalable phylogenetic network inference from multilocus data
    Zhu, Jiafan
    Liu, Xinhao
    Ogilvie, Huw A.
    Nakhleh, Luay K.
    BIOINFORMATICS, 2019, 35 (14) : I370 - I378
  • [40] Lightweight, Divide-and-Conquer privacy-preserving data aggregation in fog computing
    Sarwar, Kinza
    Yongchareon, Sira
    Yu, Jian
    Rehman, Saeed Ur
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 119 : 188 - 199