Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

被引：0

作者：

Ba, Jimmy ^{[1
,2
,3
]}

Erdogdu, Murat A. ^{[1
,2
]}

Suzuki, Taiji ^{[4
,5
]}

Wang, Zhichao ^{[6
]}

Wu, Denny ^{[7
,8
]}

机构：

[1] Univ Toronto, Toronto, ON, Canada

[2] Vector Inst, Toronto, ON, Canada

[3] xAI, Burlingame, CA USA

[4] Univ Tokyo, Tokyo, Japan

[5] RIKEN AIP, Tokyo, Japan

[6] Univ Calif San Diego, San Diego, CA USA

[7] New York Univ, New York, NY USA

[8] Flatiron Inst, New York, NY USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

LARGEST EIGENVALUE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of learning a single-index target function f(*) : R-d -> R under the spiked covariance data: f(*)(x) = (*)(1/root 1+theta < x, mu >), x similar to N(0, I-d + theta mu mu(inverted perpendicular)), theta asymptotic to d(beta) for beta is an element of[0, 1), where the link function sigma(*) : R -> R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of sigma(*)), and it depends on the projection of input x onto the spike (signal) direction mu is an element of R-d. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d -> infinity, n/d -> psi is an element of (0, infinity), we ask the following question: how large should the spike magnitude theta be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f(*)? We show that for kernel ridge regression, beta >= 1 - 1/p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, beta > 1 - 1/k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k <= p by definition, neural networks can adapt to such structures more effectively.

引用

页数：30

共 50 条

[31] EXPLOITING LOW-DIMENSIONAL STRUCTURE IN ASTRONOMICAL SPECTRA
Richards, Joseph W.
Freeman, Peter E.
Lee, Ann B.
Schafer, Chad M.
ASTROPHYSICAL JOURNAL, 2009, 691 (01): : 32 - 42
[32] Electronic Structure of Low-Dimensional Carbon π-Systems
Garcia-Lastra, J. M.
Boukahil, Idris
Qiao, Ruimin
Rubio, Angel
Himpsel, F. J.
JOURNAL OF PHYSICAL CHEMISTRY C, 2016, 120 (23): : 12362 - 12368
[33] A low-dimensional structure of neurological impairment in stroke
Bisogno, Antonio Luigi
Favaretto, Chiara
Zangrossi, Andrea
Monai, Elena
Facchini, Silvia
De Pellegrin, Serena
Pini, Lorenzo
Castellaro, Marco
Basile, Anna Maria
Baracchini, Claudio
Corbetta, Maurizio
BRAIN COMMUNICATIONS, 2021, 3 (02)
[34] Learning and exploiting low-dimensional structure for efficient holonomic motion planning in high-dimensional spaces
Vernaza, Paul
Lee, Daniel D.
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2012, 31 (14): : 1739 - 1760
[35] LOW-DIMENSIONAL LONELY BRANCHING RANDOM WALKS DIE OUT
Birkner, Matthias
Sun, Rongfeng
ANNALS OF PROBABILITY, 2019, 47 (02): : 774 - 803
[36] Low-dimensional faces of random 0/1-polytopes
Kaibel, V
INTEGER PROGRAMMING AND COMBINATORIAL OPTIMIZATION, PROCEEDINGS, 2004, 3064 : 401 - 415
[37] Learning Low-Dimensional Temporal Representations with Latent Alignments
Su, Bing
Wu, Ying
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (11) : 2842 - 2857
[38] Learning Low-Dimensional Representation of Bivariate Histogram Data
Vaiciukynas, Evaldas
Ulicny, Matej
Pashami, Sepideh
Nowaczyk, Slawomir
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (11) : 3723 - 3735
[39] MobileGCN applied to low-dimensional node feature learning
Dong, Wei
Wu, Junsheng
Bai, Zongwen
Hu, Yaoqi
Li, Weigang
Qiao, Wei
Wozniak, Marcin
PATTERN RECOGNITION, 2021, 112
[40] Low-Dimensional Invariant Embeddings for Universal Geometric Learning
Dym, Nadav
Gortler, Steven J.
FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2024, 25 (2) : 375 - 415

← 1 2 3 4 5 →