Deep learning: a statistical viewpoint

被引：114

作者：

Bartlett, Peter L. ^{[1
]}

Montanari, Andrea ^{[2
]}

Rakhlin, Alexander ^{[3
,4
]}

机构：

[1] Univ Calif Berkeley, Dept Stat & EECS, Berkeley, CA 94720 USA

[2] Stanford Univ, Dept EE & Stat, Stanford, CA 94304 USA

[3] MIT, Dept Brain & Cognit Sci, Cambridge, MA 02139 USA

[4] MIT, Stat & Data Sci Ctr, Cambridge, MA 02139 USA

来源：

ACTA NUMERICA | 2021年 / 30卷

基金：

美国国家科学基金会;

关键词：

TRAINING NEURAL-NETWORKS; VC-DIMENSION; ORACLE INEQUALITIES; SAMPLE COMPLEXITY; LEAST-SQUARES; RISK; BOUNDS; REGRESSION; CLASSIFICATION; ERROR;

D O I：

10.1017/S0962492921000027

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

引用

页码：87 / 201

页数：115

共 50 条

[21] Viewpoint Estimation for Workpieces with Deep Transfer Learning from Cold to Hot
Lu, Changsheng
Wang, Haotian
Gu, Chaochen
Wu, Kaijie
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT I, 2018, 11301 : 21 - 32
[22] THE STATISTICAL VIEWPOINT IN VOCATIONAL SELECTION
Freyd, Max
JOURNAL OF APPLIED PSYCHOLOGY, 1925, 9 (04) : 349 - 356
[23] A STATISTICAL VIEWPOINT ON THE THEORY OF EVIDENCE
HUMMEL, RA
LANDY, MS
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1988, 10 (02) : 235 - 247
[24] CRIME - FROM A STATISTICAL VIEWPOINT
Koren, John
ANNALS OF THE AMERICAN ACADEMY OF POLITICAL AND SOCIAL SCIENCE, 1914, 52 : 83 - 88
[25] Precision education with statistical learning and deep learning: a case study in Taiwan
Tsai, Shuo-Chang
Chen, Cheng-Huan
Shiao, Yi-Tzone
Ciou, Jin-Shuei
Wu, Trong-Neng
INTERNATIONAL JOURNAL OF EDUCATIONAL TECHNOLOGY IN HIGHER EDUCATION, 2020, 17 (01)
[26] Precision education with statistical learning and deep learning: a case study in Taiwan
Shuo-Chang Tsai
Cheng-Huan Chen
Yi-Tzone Shiao
Jin-Shuei Ciou
Trong-Neng Wu
International Journal of Educational Technology in Higher Education, 17
[27] A Brief Tour of Deep Learning from a Statistical Perspective
Nalisnick, Eric
Smyth, Padhraic
Tran, Dustin
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2023, 10 : 219 - 246
[28] A Study on the Relationship Between Deep Learning and Statistical Models
Ha, Il Do
MEASUREMENT-INTERDISCIPLINARY RESEARCH AND PERSPECTIVES, 2024, 22 (02) : 188 - 199
[29] Understanding Deep Learning Decisions in Statistical Downscaling Models
Bano-Medina, Jorge
PROCEEDINGS OF 2020 10TH INTERNATIONAL CONFERENCE ON CLIMATE INFORMATICS (CI2020), 2020, : 79 - 85
[30] Analysis of Statistical and Deep Learning Techniques for Temperature Forecasting
Kruthika S.G.
Rajasekaran U.
Alagarsamy M.
Sharma V.
Recent Advances in Computer Science and Communications, 2024, 17 (02) : 49 - 65

← 1 2 3 4 5 →