Deep learning: a statistical viewpoint

被引:114
|
作者
Bartlett, Peter L. [1 ]
Montanari, Andrea [2 ]
Rakhlin, Alexander [3 ,4 ]
机构
[1] Univ Calif Berkeley, Dept Stat & EECS, Berkeley, CA 94720 USA
[2] Stanford Univ, Dept EE & Stat, Stanford, CA 94304 USA
[3] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA
[4] MIT, Stat & Data Sci Ctr, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
TRAINING NEURAL-NETWORKS; VC-DIMENSION; ORACLE INEQUALITIES; SAMPLE COMPLEXITY; LEAST-SQUARES; RISK; BOUNDS; REGRESSION; CLASSIFICATION; ERROR;
D O I
10.1017/S0962492921000027
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
引用
收藏
页码:87 / 201
页数:115
相关论文
共 50 条
  • [1] Statistical Mechanics of Deep Learning
    Bahri, Yasaman
    Kadmon, Jonathan
    Pennington, Jeffrey
    Schoenholz, Sam S.
    Sohl-Dickstein, Jascha
    Ganguli, Surya
    ANNUAL REVIEW OF CONDENSED MATTER PHYSICS, VOL 11, 2020, 2020, 11 : 501 - 528
  • [2] Statistical mechanics of deep learning
    Behrens, Freya
    Mainali, Nischal
    Marullo, Chiara
    Lee, Sebastian
    Sorscher, Ben
    Sompolinsky, Haim
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2024, 2024 (10):
  • [3] Applying statistical learning theory to deep learning
    Gerbelot, Cedric
    Karagulyan, Avetik
    Karp, Stefani
    Ravichandran, Kavya
    Stern, Menachem
    Srebro, Nathan
    JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2024, 2024 (10):
  • [4] A Statistical Learning Model with Deep Learning Characteristics
    Liao, Lei
    Huang, Zhiqiu
    Wang, Wengjie
    51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN-W 2021), 2021, : 137 - 140
  • [5] A fused deep learning architecture for viewpoint classification of echocardiography
    Gao, Xiaohong
    Li, Wei
    Loomes, Martin
    Wang, Lianyi
    INFORMATION FUSION, 2017, 36 : 103 - 113
  • [6] Viewpoint Selection for DermDrone using Deep Reinforcement Learning
    Arzati, Mojtaba Ahangar
    Arzanpour, Siamak
    2021 21ST INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2021), 2021, : 544 - 553
  • [7] Statistical guarantees for sparse deep learning
    Lederer, Johannes
    ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2024, 108 (02) : 231 - 258
  • [8] Deep learning from a statistical perspective
    Yuan, Yubai
    Deng, Yujia
    Zhang, Yanqing
    Qu, Annie
    STAT, 2020, 9 (01):
  • [9] Understanding Deep Learning with Statistical Relevance
    Raez, Tim
    PHILOSOPHY OF SCIENCE, 2022, 89 (01) : 20 - 41
  • [10] Deep learning-based viewpoint recommendation in volume visualization
    Yang, Changhe
    Li, Yanda
    Liu, Can
    Yuan, Xiaoru
    JOURNAL OF VISUALIZATION, 2019, 22 (05) : 991 - 1003