Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

被引:0
|
作者
Ba, Jimmy [1 ,2 ,3 ]
Erdogdu, Murat A. [1 ,2 ]
Suzuki, Taiji [4 ,5 ]
Wang, Zhichao [6 ]
Wu, Denny [7 ,8 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] xAI, Burlingame, CA USA
[4] Univ Tokyo, Tokyo, Japan
[5] RIKEN AIP, Tokyo, Japan
[6] Univ Calif San Diego, San Diego, CA USA
[7] New York Univ, New York, NY USA
[8] Flatiron Inst, New York, NY USA
基金
加拿大自然科学与工程研究理事会;
关键词
LARGEST EIGENVALUE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of learning a single-index target function f(*) : R-d -> R under the spiked covariance data: f(*)(x) = (*)(1/root 1+theta < x, mu >), x similar to N(0, I-d + theta mu mu(inverted perpendicular)), theta asymptotic to d(beta) for beta is an element of[0, 1), where the link function sigma(*) : R -> R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of sigma(*)), and it depends on the projection of input x onto the spike (signal) direction mu is an element of R-d. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d -> infinity, n/d -> psi is an element of (0, infinity), we ask the following question: how large should the spike magnitude theta be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f(*)? We show that for kernel ridge regression, beta >= 1 - 1/p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, beta > 1 - 1/k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k <= p by definition, neural networks can adapt to such structures more effectively.
引用
收藏
页数:30
相关论文
共 50 条
  • [31] EXPLOITING LOW-DIMENSIONAL STRUCTURE IN ASTRONOMICAL SPECTRA
    Richards, Joseph W.
    Freeman, Peter E.
    Lee, Ann B.
    Schafer, Chad M.
    ASTROPHYSICAL JOURNAL, 2009, 691 (01): : 32 - 42
  • [32] Electronic Structure of Low-Dimensional Carbon π-Systems
    Garcia-Lastra, J. M.
    Boukahil, Idris
    Qiao, Ruimin
    Rubio, Angel
    Himpsel, F. J.
    JOURNAL OF PHYSICAL CHEMISTRY C, 2016, 120 (23): : 12362 - 12368
  • [33] A low-dimensional structure of neurological impairment in stroke
    Bisogno, Antonio Luigi
    Favaretto, Chiara
    Zangrossi, Andrea
    Monai, Elena
    Facchini, Silvia
    De Pellegrin, Serena
    Pini, Lorenzo
    Castellaro, Marco
    Basile, Anna Maria
    Baracchini, Claudio
    Corbetta, Maurizio
    BRAIN COMMUNICATIONS, 2021, 3 (02)
  • [34] Learning and exploiting low-dimensional structure for efficient holonomic motion planning in high-dimensional spaces
    Vernaza, Paul
    Lee, Daniel D.
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2012, 31 (14): : 1739 - 1760
  • [35] LOW-DIMENSIONAL LONELY BRANCHING RANDOM WALKS DIE OUT
    Birkner, Matthias
    Sun, Rongfeng
    ANNALS OF PROBABILITY, 2019, 47 (02): : 774 - 803
  • [36] Low-dimensional faces of random 0/1-polytopes
    Kaibel, V
    INTEGER PROGRAMMING AND COMBINATORIAL OPTIMIZATION, PROCEEDINGS, 2004, 3064 : 401 - 415
  • [37] Learning Low-Dimensional Temporal Representations with Latent Alignments
    Su, Bing
    Wu, Ying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (11) : 2842 - 2857
  • [38] Learning Low-Dimensional Representation of Bivariate Histogram Data
    Vaiciukynas, Evaldas
    Ulicny, Matej
    Pashami, Sepideh
    Nowaczyk, Slawomir
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (11) : 3723 - 3735
  • [39] MobileGCN applied to low-dimensional node feature learning
    Dong, Wei
    Wu, Junsheng
    Bai, Zongwen
    Hu, Yaoqi
    Li, Weigang
    Qiao, Wei
    Wozniak, Marcin
    PATTERN RECOGNITION, 2021, 112
  • [40] Low-Dimensional Invariant Embeddings for Universal Geometric Learning
    Dym, Nadav
    Gortler, Steven J.
    FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2024, 25 (2) : 375 - 415