Large-Scale Machine Learning with Stochastic Gradient Descent

被引:3576
|
作者
Bottou, Leon [1 ]
机构
[1] NEC Labs Amer, Princeton, NJ 08542 USA
关键词
stochastic gradient descent; online learning; efficiency;
D O I
10.1007/978-3-7908-2604-3_16
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs for the case of small-scale and large-scale learning problems. The large-scale case involves the computational complexity of the underlying optimization algorithm in non-trivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for large-scale problems. In particular, second order stochastic gradient and averaged stochastic gradient are asymptotically efficient after a single pass on the training set.
引用
收藏
页码:177 / 186
页数:10
相关论文
共 50 条
  • [31] Distributing the Stochastic Gradient Sampler for Large-Scale LDA
    Yang, Yuan
    Chen, Jianfei
    Zhu, Jun
    KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 1975 - 1984
  • [32] Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning
    Yang, Zhuang
    APPLIED INTELLIGENCE, 2023, 53 (23) : 28627 - 28641
  • [33] Stochastic variance reduced gradient with hyper-gradient for non-convex large-scale learning
    Zhuang Yang
    Applied Intelligence, 2023, 53 : 28627 - 28641
  • [34] Large-Scale and Scalable Latent Factor Analysis via Distributed Alternative Stochastic Gradient Descent for Recommender Systems
    Shi, Xiaoyu
    He, Qiang
    Luo, Xin
    Bai, Yanan
    Shang, Mingsheng
    IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (02) : 420 - 431
  • [35] Sufficient descent conjugate gradient methods for large-scale optimization problems
    Zheng, Xiuyun
    Liu, Hongwei
    Lu, Aiguo
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2011, 88 (16) : 3436 - 3447
  • [36] A descent nonlinear conjugate gradient method for large-scale unconstrained optimization
    Yu, Gaohang
    Zhao, Yanlin
    Wei, Zengxin
    APPLIED MATHEMATICS AND COMPUTATION, 2007, 187 (02) : 636 - 643
  • [37] Large-Scale Stochastic Learning using GPUs
    Parnell, Thomas
    Dunner, Celestine
    Atasu, Kubilay
    Sifalakis, Manolis
    Pozidis, Haris
    2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 419 - 428
  • [38] Variance Counterbalancing for Stochastic Large-scale Learning
    Lagari, Pola Lydia
    Tsoukalas, Lefteri H.
    Lagaris, Isaac E.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2020, 29 (05)
  • [39] Large Scale Optimization with Proximal Stochastic Newton-Type Gradient Descent
    Shi, Ziqiang
    Liu, Rujie
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2015, PT I, 2015, 9284 : 691 - 704
  • [40] Large scale semi-supervised linear SVM with stochastic gradient descent
    Zhou, X. (zhouxin@mtlab.hit.edu.cn), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):