Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

被引:0
|
作者
Ba, Jimmy [1 ,2 ,3 ]
Erdogdu, Murat A. [1 ,2 ]
Suzuki, Taiji [4 ,5 ]
Wang, Zhichao [6 ]
Wu, Denny [7 ,8 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] xAI, Burlingame, CA USA
[4] Univ Tokyo, Tokyo, Japan
[5] RIKEN AIP, Tokyo, Japan
[6] Univ Calif San Diego, San Diego, CA USA
[7] New York Univ, New York, NY USA
[8] Flatiron Inst, New York, NY USA
基金
加拿大自然科学与工程研究理事会;
关键词
LARGEST EIGENVALUE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of learning a single-index target function f(*) : R-d -> R under the spiked covariance data: f(*)(x) = (*)(1/root 1+theta < x, mu >), x similar to N(0, I-d + theta mu mu(inverted perpendicular)), theta asymptotic to d(beta) for beta is an element of[0, 1), where the link function sigma(*) : R -> R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of sigma(*)), and it depends on the projection of input x onto the spike (signal) direction mu is an element of R-d. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d -> infinity, n/d -> psi is an element of (0, infinity), we ask the following question: how large should the spike magnitude theta be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f(*)? We show that for kernel ridge regression, beta >= 1 - 1/p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, beta > 1 - 1/k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k <= p by definition, neural networks can adapt to such structures more effectively.
引用
收藏
页数:30
相关论文
共 50 条
  • [21] Learning as formation of low-dimensional representation spaces
    Edelman, S
    Intrator, N
    PROCEEDINGS OF THE NINETEENTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, 1997, : 199 - 204
  • [22] Thermodynamic superheating of low-dimensional metals embedded in matrix
    Jiang, Q
    Liang, LH
    Li, JC
    VACUUM, 2003, 72 (03) : 249 - 255
  • [23] Deep learning model with low-dimensional random projection for large-scale image search
    Alzu'bi, Ahmad
    Abuarqoub, Abdelrahman
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2020, 23 (04): : 911 - 920
  • [24] Heat transport in low-dimensional materials: A review and perspective
    Xu, Zhiping
    THEORETICAL AND APPLIED MECHANICS LETTERS, 2016, 6 (03) : 113 - 121
  • [25] Heat transport in low-dimensional materials: A review and perspective
    Zhiping Xu
    Theoretical & Applied Mechanics Letters, 2016, 6 (03) : 113 - 121
  • [26] Constraining Weil-Petersson volumes by universal random matrix correlations in low-dimensional quantum gravity
    Weber, Torsten
    Haneder, Fabian
    Richter, Klaus
    Urbina, Juan Diego
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2023, 56 (20)
  • [27] Constraining Weil-Petersson volumes by universal random matrix correlations in low-dimensional quantum gravity
    Weber, Torsten
    Haneder, Fabian
    Richter, Klaus
    Urbina, Juan Diego
    arXiv, 2022,
  • [28] Infrared photodetector on the basis of low-dimensional structure
    Huseynov, E
    Salmanov, V
    17TH INTERNATIONAL CONFERENCE ON PHOTOELECTRONICS AND NIGHT VISION DEVICES, 2003, 5126 : 187 - 190
  • [29] Critical state structure in low-dimensional superconductors
    Maksimov, IL
    Maksimova, GM
    Elistratov, AA
    CZECHOSLOVAK JOURNAL OF PHYSICS, 1996, 46 : 1821 - 1822
  • [30] Strength of low-dimensional nano-structure
    Kitamura, T.
    Sumigawa, T.
    Hirakata, H.
    Takahashi, Y.
    ADVANCES IN HETEROGENEOUS MATERIAL MECHANICS 2008, 2008, : 79 - 88