Deep learning: a statistical viewpoint

被引:114
|
作者
Bartlett, Peter L. [1 ]
Montanari, Andrea [2 ]
Rakhlin, Alexander [3 ,4 ]
机构
[1] Univ Calif Berkeley, Dept Stat & EECS, Berkeley, CA 94720 USA
[2] Stanford Univ, Dept EE & Stat, Stanford, CA 94304 USA
[3] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA
[4] MIT, Stat & Data Sci Ctr, Cambridge, MA 02139 USA
基金
美国国家科学基金会;
关键词
TRAINING NEURAL-NETWORKS; VC-DIMENSION; ORACLE INEQUALITIES; SAMPLE COMPLEXITY; LEAST-SQUARES; RISK; BOUNDS; REGRESSION; CLASSIFICATION; ERROR;
D O I
10.1017/S0962492921000027
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.
引用
收藏
页码:87 / 201
页数:115
相关论文
共 50 条
  • [31] Statistical and Deep Learning Approaches for Literary Genre Classification
    Goyal, Anshaj
    Prakash, V. Prem
    ADVANCES IN DATA AND INFORMATION SCIENCES, 2022, 318 : 297 - 305
  • [32] Error Analysis of Regularized Trigonometric Linear Regression With Unbounded Sampling: A Statistical Learning Viewpoint
    Scampicchio, Anna
    Arcari, Elena
    Zeilinger, Melanie N.
    IEEE CONTROL SYSTEMS LETTERS, 2023, 7 : 3066 - 3071
  • [33] Learning to predict sustainable aviation fuel properties: A deep uncertainty quantification viewpoint
    Oh, Ji-Hun
    Oldani, Anna
    Solecki, Alex
    Lee, Tonghun
    FUEL, 2024, 356
  • [34] A Lightweight Deep Learning Model for Vehicle Viewpoint Estimation from Dashcam Images
    Magistri, Simone
    Sambo, Francesco
    Schoen, Fabio
    de Andrade, Douglas Coimbra
    Simoncini, Matteo
    Caprasecca, Stefano
    Kubin, Luca
    Bravi, Luca
    Taccari, Leonardo
    2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [35] Collaborative Viewpoint Adjusting and Grasping via Deep Reinforcement Learning in Clutter Scenes
    Liu, Ning
    Guo, Cangui
    Liang, Rongzhao
    Li, Deping
    MACHINES, 2022, 10 (12)
  • [36] Viewpoint projection based deep feature learning for single and dyadic action recognition
    Keceli, Ali Seydi
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 104 : 235 - 243
  • [37] Statistical Machine Learning vs Deep Learning in Information Fusion: Competition or Collaboration?
    Guan, Ling
    Gao, Lei
    Elmadany, Nour El Din
    Liang, Chengwu
    IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 251 - 256
  • [38] Statistical, machine learning and deep learning forecasting methods: Comparisons and ways forward
    Makridakis, Spyros
    Spiliotis, Evangelos
    Assimakopoulos, Vassilios
    Semenoglou, Artemios-Anargyros
    Mulder, Gary
    Nikolopoulos, Konstantinos
    JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2023, 74 (03) : 840 - 859
  • [39] Statistical estimation from an optimization viewpoint
    Roger J‐B Wets
    Annals of Operations Research, 1999, 85 : 79 - 101
  • [40] STATISTICAL REPORTING FROM THE BAYESIAN VIEWPOINT
    RAIFFA, H
    PUBLIC OPINION QUARTERLY, 1963, 27 (04) : 640 - 640