Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

被引：0

作者：

Ba, Jimmy ^{[1
,2
,3
]}

Erdogdu, Murat A. ^{[1
,2
]}

Suzuki, Taiji ^{[4
,5
]}

Wang, Zhichao ^{[6
]}

Wu, Denny ^{[7
,8
]}

机构：

[1] Univ Toronto, Toronto, ON, Canada

[2] Vector Inst, Toronto, ON, Canada

[3] xAI, Burlingame, CA USA

[4] Univ Tokyo, Tokyo, Japan

[5] RIKEN AIP, Tokyo, Japan

[6] Univ Calif San Diego, San Diego, CA USA

[7] New York Univ, New York, NY USA

[8] Flatiron Inst, New York, NY USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

LARGEST EIGENVALUE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of learning a single-index target function f(*) : R-d -> R under the spiked covariance data: f(*)(x) = (*)(1/root 1+theta < x, mu >), x similar to N(0, I-d + theta mu mu(inverted perpendicular)), theta asymptotic to d(beta) for beta is an element of[0, 1), where the link function sigma(*) : R -> R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of sigma(*)), and it depends on the projection of input x onto the spike (signal) direction mu is an element of R-d. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d -> infinity, n/d -> psi is an element of (0, infinity), we ask the following question: how large should the spike magnitude theta be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f(*)? We show that for kernel ridge regression, beta >= 1 - 1/p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, beta > 1 - 1/k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k <= p by definition, neural networks can adapt to such structures more effectively.

引用

页数：30

共 50 条

[21] Learning as formation of low-dimensional representation spaces
Edelman, S
Intrator, N
PROCEEDINGS OF THE NINETEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1997, : 199 - 204
[22] Thermodynamic superheating of low-dimensional metals embedded in matrix
Jiang, Q
Liang, LH
Li, JC
VACUUM, 2003, 72 (03) : 249 - 255
[23] Deep learning model with low-dimensional random projection for large-scale image search
Alzu'bi, Ahmad
Abuarqoub, Abdelrahman
ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2020, 23 (04): : 911 - 920
[24] Heat transport in low-dimensional materials: A review and perspective
Xu, Zhiping
THEORETICAL AND APPLIED MECHANICS LETTERS, 2016, 6 (03) : 113 - 121
[25] Heat transport in low-dimensional materials: A review and perspective
Zhiping Xu
Theoretical & Applied Mechanics Letters, 2016, 6 (03) : 113 - 121
[26] Constraining Weil-Petersson volumes by universal random matrix correlations in low-dimensional quantum gravity
Weber, Torsten
Haneder, Fabian
Richter, Klaus
Urbina, Juan Diego
JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2023, 56 (20)
[27] Constraining Weil-Petersson volumes by universal random matrix correlations in low-dimensional quantum gravity
Weber, Torsten
Haneder, Fabian
Richter, Klaus
Urbina, Juan Diego
arXiv, 2022,
[28] Infrared photodetector on the basis of low-dimensional structure
Huseynov, E
Salmanov, V
17TH INTERNATIONAL CONFERENCE ON PHOTOELECTRONICS AND NIGHT VISION DEVICES, 2003, 5126 : 187 - 190
[29] Critical state structure in low-dimensional superconductors
Maksimov, IL
Maksimova, GM
Elistratov, AA
CZECHOSLOVAK JOURNAL OF PHYSICS, 1996, 46 : 1821 - 1822
[30] Strength of low-dimensional nano-structure
Kitamura, T.
Sumigawa, T.
Hirakata, H.
Takahashi, Y.
ADVANCES IN HETEROGENEOUS MATERIAL MECHANICS 2008, 2008, : 79 - 88

← 1 2 3 4 5 →