Generalization error of random feature and kernel methods: Hypercontractivity and kernel matrix concentration

被引:34
|
作者
Mei, Song [1 ]
Misiakiewicz, Theodor [2 ]
Montanari, Andrea [2 ,3 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Random features; Kernel methods; Generalization error; High dimensional limit; INEQUALITIES;
D O I
10.1016/j.acha.2021.12.003
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Consider the classical supervised learning problem: we are given data (yi,xi), i <= n, with yi a response and x(i) is an element of X a covariates vector, and try to learn a model f : X -> R to predict future responses. Random features methods map the covariates vector x(i) to a point phi(x(i)) in a higher dimensional space R-N, via a random featurization map phi. We study the use of random features methods in conjunction with ridge regression in the feature space R-N. This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime. We define a class of problems satisfying certain spectral conditions on the underlying kernels, and a hypercontractivity assumption on the associated eigenfunctions. These conditions are verified by classical high-dimensional examples. Under these conditions, we prove a sharp characterization of the error of random features ridge regression. In particular, we address two fundamental questions: (1) What is the generalization error of KRR? (2) How big N should be for the random features approximation to achieve the same error as KRR? In this setting, we prove that KRR is well approximated by a projection onto the top l eigenfunctions of the kernel, where l depends on the sample size n. We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as N <= n(1-delta) for some delta > 0. We characterize this gap. For N >= n(1+delta), random features achieve the same error as the corresponding KRR, and further increasing N does not lead to a significant change in test error. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:3 / 84
页数:82
相关论文
共 50 条
  • [31] Feature Selection with PSO and Kernel Methods for Hyperspectral Classification
    Tjiong, Anthony S. J.
    Monteiro, Sildomar T.
    2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 1762 - 1769
  • [32] Efficient χ2 Kernel Linearization via Random Feature Maps
    Yuan, Xiao-Tong
    Wang, Zhenzhen
    Deng, Jiankang
    Liu, Qingshan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (11) : 2448 - 2453
  • [33] On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions
    Bach, Francis
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [34] Sparse random feature maps for the item-multiset kernel
    Atarashi, Kyohei
    Oyama, Satoshi
    Kurihara, Masahito
    NEURAL NETWORKS, 2021, 143 : 500 - 514
  • [35] An Empirical Study on The Properties of Random Bases for Kernel Methods
    Alber, Maximilian
    Kindermans, Pieter-Jan
    Schuett, Kristof T.
    Mueller, Klaus-Robert
    Sha, Fei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [36] Evaluation of four Lagrangian particle concentration calculation methods-Box counting, Gaussian kernel, Uniform kernel and Parabolic kernel
    Yang, Li
    Wang, Cun-You
    Chen, Yi-Xue
    Zhuang, Shu-Han
    Li, Xin-Peng
    Fang, Sheng
    Zhongguo Huanjing Kexue/China Environmental Science, 2023, 43 (07): : 3404 - 3415
  • [37] Kernel nonnegative matrix factorization for spectral EEG feature extraction
    Lee, Hyekyoung
    Cichocki, Andrzej
    Choi, Seungjin
    NEUROCOMPUTING, 2009, 72 (13-15) : 3182 - 3190
  • [38] Combinatorial Kernel Matrix Model Selection Using Feature Distances
    Jia, Lei
    Liao, Shizhong
    INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL 1, PROCEEDINGS, 2008, : 40 - 43
  • [39] Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime
    Cui, Hugo
    Loureiro, Bruno
    Krzakala, Florent
    Zdeborova, Lenka
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [40] Incomplete-view oriented kernel learning method with generalization error bound
    Tian Y.
    Fu S.
    Tang J.
    Information Sciences, 2021, 581 : 951 - 977