Generalization error of random feature and kernel methods: Hypercontractivity and kernel matrix concentration

被引：34

作者：

Mei, Song ^{[1
]}

Misiakiewicz, Theodor ^{[2
]}

Montanari, Andrea ^{[2
,3
]}

机构：

[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA

[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA

[3] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA

来源：

APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS | 2022年 / 59卷

关键词：

Random features; Kernel methods; Generalization error; High dimensional limit; INEQUALITIES;

D O I：

10.1016/j.acha.2021.12.003

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Consider the classical supervised learning problem: we are given data (yi,xi), i <= n, with yi a response and x(i) is an element of X a covariates vector, and try to learn a model f : X -> R to predict future responses. Random features methods map the covariates vector x(i) to a point phi(x(i)) in a higher dimensional space R-N, via a random featurization map phi. We study the use of random features methods in conjunction with ridge regression in the feature space R-N. This can be viewed as a finite-dimensional approximation of kernel ridge regression (KRR), or as a stylized model for neural networks in the so called lazy training regime. We define a class of problems satisfying certain spectral conditions on the underlying kernels, and a hypercontractivity assumption on the associated eigenfunctions. These conditions are verified by classical high-dimensional examples. Under these conditions, we prove a sharp characterization of the error of random features ridge regression. In particular, we address two fundamental questions: (1) What is the generalization error of KRR? (2) How big N should be for the random features approximation to achieve the same error as KRR? In this setting, we prove that KRR is well approximated by a projection onto the top l eigenfunctions of the kernel, where l depends on the sample size n. We show that the test error of random features ridge regression is dominated by its approximation error and is larger than the error of KRR as long as N <= n(1-delta) for some delta > 0. We characterize this gap. For N >= n(1+delta), random features achieve the same error as the corresponding KRR, and further increasing N does not lead to a significant change in test error. (c) 2021 Elsevier Inc. All rights reserved.

引用

页码：3 / 84

页数：82

共 50 条

[31] Feature Selection with PSO and Kernel Methods for Hyperspectral Classification
Tjiong, Anthony S. J.
Monteiro, Sildomar T.
2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 1762 - 1769
[32] Efficient χ2 Kernel Linearization via Random Feature Maps
Yuan, Xiao-Tong
Wang, Zhenzhen
Deng, Jiankang
Liu, Qingshan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (11) : 2448 - 2453
[33] On the Equivalence between Kernel Quadrature Rules and Random Feature Expansions
Bach, Francis
JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
[34] Sparse random feature maps for the item-multiset kernel
Atarashi, Kyohei
Oyama, Satoshi
Kurihara, Masahito
NEURAL NETWORKS, 2021, 143 : 500 - 514
[35] An Empirical Study on The Properties of Random Bases for Kernel Methods
Alber, Maximilian
Kindermans, Pieter-Jan
Schuett, Kristof T.
Mueller, Klaus-Robert
Sha, Fei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[36] Evaluation of four Lagrangian particle concentration calculation methods-Box counting, Gaussian kernel, Uniform kernel and Parabolic kernel
Yang, Li
Wang, Cun-You
Chen, Yi-Xue
Zhuang, Shu-Han
Li, Xin-Peng
Fang, Sheng
Zhongguo Huanjing Kexue/China Environmental Science, 2023, 43 (07): : 3404 - 3415
[37] Kernel nonnegative matrix factorization for spectral EEG feature extraction
Lee, Hyekyoung
Cichocki, Andrzej
Choi, Seungjin
NEUROCOMPUTING, 2009, 72 (13-15) : 3182 - 3190
[38] Combinatorial Kernel Matrix Model Selection Using Feature Distances
Jia, Lei
Liao, Shizhong
INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTATION TECHNOLOGY AND AUTOMATION, VOL 1, PROCEEDINGS, 2008, : 40 - 43
[39] Generalization Error Rates in Kernel Regression: The Crossover from the Noiseless to Noisy Regime
Cui, Hugo
Loureiro, Bruno
Krzakala, Florent
Zdeborova, Lenka
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[40] Incomplete-view oriented kernel learning method with generalization error bound
Tian Y.
Fu S.
Tang J.
Information Sciences, 2021, 581 : 951 - 977

← 1 2 3 4 5 →