Large-Scale Machine Learning with Stochastic Gradient Descent

被引：3576

作者：

Bottou, Leon ^{[1
]}

机构：

[1] NEC Labs Amer, Princeton, NJ 08542 USA

来源：

COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS | 2010年

关键词：

stochastic gradient descent; online learning; efficiency;

D O I：

10.1007/978-3-7908-2604-3_16

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.

引用

页码：177 / 186

页数：10

共 50 条

[31] Distributing the Stochastic Gradient Sampler for Large-Scale LDA
Yang, Yuan
Chen, Jianfei
Zhu, Jun
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1975 - 1984
[32] Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning
Yang, Zhuang
APPLIED INTELLIGENCE, 2023, 53 (23) : 28627 - 28641
[33] Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning
Zhuang Yang
Applied Intelligence, 2023, 53 : 28627 - 28641
[34] Large-Scale and Scalable Latent Factor Analysis via Distributed Alternative Stochastic Gradient Descent for Recommender Systems
Shi, Xiaoyu
He, Qiang
Luo, Xin
Bai, Yanan
Shang, Mingsheng
IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (02) : 420 - 431
[35] Sufficient descent conjugate gradient methods for large-scale optimization problems
Zheng, Xiuyun
Liu, Hongwei
Lu, Aiguo
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2011, 88 (16) : 3436 - 3447
[36] A descent nonlinear conjugate gradient method for large-scale unconstrained optimization
Yu, Gaohang
Zhao, Yanlin
Wei, Zengxin
APPLIED MATHEMATICS AND COMPUTATION, 2007, 187 (02) : 636 - 643
[37] Large-Scale Stochastic Learning using GPUs
Parnell, Thomas
Dunner, Celestine
Atasu, Kubilay
Sifalakis, Manolis
Pozidis, Haris
2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 419 - 428
[38] Variance Counterbalancing for Stochastic Large-scale Learning
Lagari, Pola Lydia
Tsoukalas, Lefteri H.
Lagaris, Isaac E.
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (05)
[39] Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent
Shi, Ziqiang
Liu, Rujie
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2015, PT I, 2015, 9284 : 691 - 704
[40] Large scale semi-supervised linear SVM with stochastic gradient descent
Zhou, X. (zhouxin@mtlab.hit.edu.cn), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):

← 1 2 3 4 5 →