Generalization error of random feature and kernel methods: Hypercontractivity and kernel matrix concentration

被引:34
|
作者
Mei, Song [1 ]
Misiakiewicz, Theodor [2 ]
Montanari, Andrea [2 ,3 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Random features; Kernel methods; Generalization error; High dimensional limit; INEQUALITIES;
D O I
10.1016/j.acha.2021.12.003
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Consider the classical supervised learning problem: we are given data (yi,xi), i <= n, with yi a response and x(i) is an element of X a covariates vector, and try to learn a model f : X -> R to predict future responses. Random features methods map the covariates vector x(i) to a point phi(x(i)) in a higher dimensional space R-N, via a random featurization map phi. We study the use of random features methods in conjunction with ridge regression in the feature space R-N. This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime. We define a class of problems satisfying certain spectral conditions on the underlying kernels, and a hypercontractivity assumption on the associated eigenfunctions. These conditions are verified by classical high-dimensional examples. Under these conditions, we prove a sharp characterization of the error of random features ridge regression. In particular, we address two fundamental questions: (1) What is the generalization error of KRR? (2) How big N should be for the random features approximation to achieve the same error as KRR? In this setting, we prove that KRR is well approximated by a projection onto the top l eigenfunctions of the kernel, where l depends on the sample size n. We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as N <= n(1-delta) for some delta > 0. We characterize this gap. For N >= n(1+delta), random features achieve the same error as the corresponding KRR, and further increasing N does not lead to a significant change in test error. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:3 / 84
页数:82
相关论文
共 50 条
  • [41] Generalization error rates in kernel regression: the crossover from the noiseless to noisy regime*
    Cui, Hugo
    Loureiro, Bruno
    Krzakala, Florent
    Zdeborova, Lenka
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2022, 2022 (11):
  • [42] RANDOM MATRIX ASYMPTOTICS OF INNER PRODUCT KERNEL SPECTRAL CLUSTERING
    Ali, Hafiz Tiomoko
    Kammoun, Abla
    Couillet, Romain
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2441 - 2445
  • [43] Improvement of the kernel minimum squared error model for fast feature extraction
    Wang, Jinghua
    Wang, Peng
    Li, Qin
    You, Jane
    NEURAL COMPUTING & APPLICATIONS, 2013, 23 (01): : 53 - 59
  • [44] Improvement of the kernel minimum squared error model for fast feature extraction
    Jinghua Wang
    Peng Wang
    Qin Li
    Jane You
    Neural Computing and Applications, 2013, 23 : 53 - 59
  • [45] Randomized Feature Engineering as a Fast and Accurate Alternative to Kernel Methods
    Wang, Suhang
    Aggarwal, Charu
    Liu, Huan
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 485 - 494
  • [46] Feature Encoding Methods Evaluation based on Multiple kernel Learning
    Zhao, Ziyi
    Shi, Dan
    Huo, Hong
    Fang, Tao
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING (ICMLC 2018), 2018, : 209 - 213
  • [47] Error analysis of kernel/GP methods for nonlinear and parametric PDEs
    Batlle, Pau
    Chen, Yifan
    Hosseini, Bamdad
    Owhadi, Houman
    Stuart, Andrew M.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2024, 520
  • [48] A Review of Kernel Methods for Feature Extraction in Nonlinear Process Monitoring
    Pilario, Karl Ezra
    Shafiee, Mahmood
    Cao, Yi
    Lao, Liyun
    Yang, Shuang-Hua
    PROCESSES, 2020, 8 (01)
  • [49] Gabor feature based face recognition using kernel methods
    Shen, LL
    Bai, L
    SIXTH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, PROCEEDINGS, 2004, : 170 - 176
  • [50] Large-scale Online Kernel Learning with Random Feature Reparameterization
    Tu Dinh Nguyen
    Le, Trung
    Bui, Hung
    Phung, Dinh
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2543 - 2549