Learning in the Presence of Low-dimensional Structure: A Spiked Random Matrix Perspective

被引:0
|
作者
Ba, Jimmy [1 ,2 ,3 ]
Erdogdu, Murat A. [1 ,2 ]
Suzuki, Taiji [4 ,5 ]
Wang, Zhichao [6 ]
Wu, Denny [7 ,8 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Vector Inst, Toronto, ON, Canada
[3] xAI, Burlingame, CA USA
[4] Univ Tokyo, Tokyo, Japan
[5] RIKEN AIP, Tokyo, Japan
[6] Univ Calif San Diego, San Diego, CA USA
[7] New York Univ, New York, NY USA
[8] Flatiron Inst, New York, NY USA
基金
加拿大自然科学与工程研究理事会;
关键词
LARGEST EIGENVALUE;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of learning a single-index target function f(*) : R-d -> R under the spiked covariance data: f(*)(x) = (*)(1/root 1+theta < x, mu >), x similar to N(0, I-d + theta mu mu(inverted perpendicular)), theta asymptotic to d(beta) for beta is an element of[0, 1), where the link function sigma(*) : R -> R is a degree-p polynomial with information exponent k (defined as the lowest degree in the Hermite expansion of sigma(*)), and it depends on the projection of input x onto the spike (signal) direction mu is an element of R-d. In the proportional asymptotic limit where the number of training examples n and the dimensionality d jointly diverge: n, d -> infinity, n/d -> psi is an element of (0, infinity), we ask the following question: how large should the spike magnitude theta be, in order for (i) kernel methods, (ii) neural networks optimized by gradient descent, to learn f(*)? We show that for kernel ridge regression, beta >= 1 - 1/p is both sufficient and necessary. Whereas for two-layer neural networks trained with gradient descent, beta > 1 - 1/k suffices. Our results demonstrate that both kernel methods and neural networks benefit from low-dimensional structures in the data. Further, since k <= p by definition, neural networks can adapt to such structures more effectively.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Learning low-dimensional structure in house price indices
    Glynn, Chris
    APPLIED STOCHASTIC MODELS IN BUSINESS AND INDUSTRY, 2022, 38 (01) : 151 - 168
  • [2] Learning Low-Dimensional Metrics
    Jain, Lalit
    Mason, Blake
    Nowak, Robert
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [3] Low-dimensional approximations of random vibration systems
    Wunderlich, R
    vom Scheidt, J
    Starkloff, HJ
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 2001, 81 : S651 - S652
  • [4] Low-dimensional dynamics of structured random networks
    Aljadeff, Johnatan
    Renfrew, David
    Vegue, Marina
    Sharpee, Tatyana O.
    PHYSICAL REVIEW E, 2016, 93 (02)
  • [5] Extracting Low-Dimensional Latent Structure from Time Series in the Presence of Delays
    Lakshmanan, Karthik C.
    Sadtler, Patrick T.
    Tyler-Kabara, Elizabeth C.
    Batista, Aaron P.
    Yu, Byron M.
    NEURAL COMPUTATION, 2015, 27 (09) : 1825 - 1856
  • [6] LOW-DIMENSIONAL DECOMPOSITION OF MANIFOLDS IN PRESENCE OF OUTLIERS
    Sedghi, Mahlagha
    Atia, George
    Georgiopoulos, Michael
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [7] Low-Dimensional Learning for Complex Robots
    O'Flaherty, Rowland
    Egerstedt, Magnus
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2015, 12 (01) : 19 - 27
  • [8] Learning Low-Dimensional Temporal Representations
    Su, Bing
    Wu, Ying
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [9] Learning Low-Dimensional Models of Microscopes
    Debarnot, Valentin
    Escande, Paul
    Mangeat, Thomas
    Weiss, Pierre
    IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2021, 7 (07) : 178 - 190
  • [10] Unsupervised feature selection via graph matrix learning and the low-dimensional space learning for classification
    Han, Xiaohong
    Liu, Ping
    Wang, Li
    Li, Dengao
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 87