Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

被引：0

作者：

Ba, Jimmy ^{[1
,2
,3
]}

Erdogdu, Murat A. ^{[1
,2
]}

Suzuki, Taiji ^{[4
,5
]}

Wang, Zhichao ^{[6
]}

Wu, Denny ^{[7
,8
]}

机构：

[1] Univ Toronto, Toronto, ON, Canada

[2] Vector Inst, Toronto, ON, Canada

[3] xAI, Burlingame, CA USA

[4] Univ Tokyo, Tokyo, Japan

[5] RIKEN AIP, Tokyo, Japan

[6] Univ Calif San Diego, San Diego, CA USA

[7] New York Univ, New York, NY USA

[8] Flatiron Inst, New York, NY USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

加拿大自然科学与工程研究理事会;

关键词：

LARGEST EIGENVALUE;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of learning a single-index target function f(*) : R-d -> R under the spiked covariance data: f(*)(x) = (*)(1/root 1+theta < x, mu >), x similar to N(0, I-d + theta mu mu(inverted perpendicular)), theta asymptotic to d(beta) for beta is an element of[0, 1), where the link function sigma(*) : R -> R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of sigma(*)), and it depends on the projection of input x onto the spike (signal) direction mu is an element of R-d. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d -> infinity, n/d -> psi is an element of (0, infinity), we ask the following question: how large should the spike magnitude theta be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f(*)? We show that for kernel ridge regression, beta >= 1 - 1/p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, beta > 1 - 1/k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k <= p by definition, neural networks can adapt to such structures more effectively.

引用

页数：30

共 50 条

[1] Learning low-dimensional structure in house price indices
Glynn, Chris
APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2022, 38 (01) : 151 - 168
[2] Learning Low-Dimensional Metrics
Jain, Lalit
Mason, Blake
Nowak, Robert
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[3] Low-dimensional approximations of random vibration systems
Wunderlich, R
vom Scheidt, J
Starkloff, HJ
ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 2001, 81 : S651 - S652
[4] Low-dimensional dynamics of structured random networks
Aljadeff, Johnatan
Renfrew, David
Vegue, Marina
Sharpee, Tatyana O.
PHYSICAL REVIEW E, 2016, 93 (02)
[5] Extracting Low-Dimensional Latent Structure from Time Series in the Presence of Delays
Lakshmanan, Karthik C.
Sadtler, Patrick T.
Tyler-Kabara, Elizabeth C.
Batista, Aaron P.
Yu, Byron M.
NEURAL COMPUTATION, 2015, 27 (09) : 1825 - 1856
[6] LOW-DIMENSIONAL DECOMPOSITION OF MANIFOLDS IN PRESENCE OF OUTLIERS
Sedghi, Mahlagha
Atia, George
Georgiopoulos, Michael
2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
[7] Low-Dimensional Learning for Complex Robots
O'Flaherty, Rowland
Egerstedt, Magnus
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2015, 12 (01) : 19 - 27
[8] Learning Low-Dimensional Temporal Representations
Su, Bing
Wu, Ying
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[9] Learning Low-Dimensional Models of Microscopes
Debarnot, Valentin
Escande, Paul
Mangeat, Thomas
Weiss, Pierre
IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2021, 7 (07) : 178 - 190
[10] Unsupervised feature selection via graph matrix learning and the low-dimensional space learning for classification
Han, Xiaohong
Liu, Ping
Wang, Li
Li, Dengao
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 87

← 1 2 3 4 5 →