Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU plus GPU Architectures

被引:3
|
作者
Ma, Yujing [1 ]
Rusu, Florin [1 ]
Wu, Kesheng [2 ]
Sim, Alexander [2 ]
机构
[1] Univ Calif Merced, Merced, CA 95343 USA
[2] Lawrence Berkeley Natl Lab, Berkeley, CA USA
关键词
SGD; fully-connected MLP; adaptive batch size;
D O I
10.1109/IPDPSW52791.2021.00012
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The widely-adopted practice is to train deep learning models with specialized hardware accelerators, e.g., GPUs or TPUs, due to their superior performance on linear algebra operations. However, this strategy does not employ effectively the extensive CPU and memory resources - which are used only for preprocessing, data transfer, and scheduling - available by default on the accelerated servers. In this paper, we study training algorithms for deep learning on heterogeneous CPU+GPU architectures. Our two-fold objective - maximize convergence rate and resource utilization simultaneously - makes the problem challenging. In order to allow for a principled exploration of the design space, we first introduce a generic deep learning framework that exploits the difference in computational power and memory hierarchy between CPU and GPU through asynchronous message passing. Based on insights gained through experimentation with the framework, we design two heterogeneous asynchronous stochastic gradient descent (SGD) algorithms. The first algorithm - CPU+GPU Hogbatch - combines small batches on CPU with large batches on GPU in order to maximize the utilization of both resources. However, this generates an unbalanced model update distribution which hinders the statistical convergence. The second algorithm - Adaptive Hogbatch - assigns batches with continuously evolving size based on the relative speed of CPU and GPU. This balances the model updates ratio at the expense of a customizable decrease in utilization. We show that the implementation of these algorithms in the proposed CPU+GPU framework achieves both faster convergence and higher resource utilization than TensorFlow on several real datasets.
引用
收藏
页码:6 / 15
页数:10
相关论文
共 50 条
  • [1] Distributed Learning of CNNs on Heterogeneous CPU/GPU Architectures
    Marques, Jose
    Falcao, Gabriel
    Alexandre, Luis A.
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2018, 32 (9-10) : 822 - 844
  • [2] Stochastic Gradient Descent on Modem Hardware: Multi-core CPU or GPU? Synchronous or Asynchronous?
    Ma, Yujing
    Rusu, Florin
    Torres, Martin
    [J]. 2019 IEEE 33RD INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2019), 2019, : 1063 - 1072
  • [3] Recent Advances in Stochastic Gradient Descent in Deep Learning
    Tian, Yingjie
    Zhang, Yuqi
    Zhang, Haibin
    [J]. MATHEMATICS, 2023, 11 (03)
  • [4] Heterogeneous CPU plus GPU approaches for HEVC
    Cebrian-Marquez, Gabriel
    Galiano, Vicente
    Migallon, Hector
    Luis Martinez, Jose
    Cuenca, Pedro
    Lopez-Granado, Otoniel
    [J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (03): : 1215 - 1226
  • [5] Denial of Service in CPU-GPU Heterogeneous Architectures
    Wen, Hao
    Zhang, Wei
    [J]. 2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [6] Sorting Large Datasets with Heterogeneous CPU/GPU Architectures
    Gowanlock, Michael
    Karsin, Ben
    [J]. 2018 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2018), 2018, : 560 - 569
  • [7] Heterogeneous CPU plus GPU approaches for HEVC
    Gabriel Cebrián-Márquez
    Vicente Galiano
    Héctor Migallón
    José Luis Martínez
    Pedro Cuenca
    Otoniel López-Granado
    [J]. The Journal of Supercomputing, 2019, 75 : 1215 - 1226
  • [8] Reducing CPU-GPU Interferences to Improve CPU Performance in Heterogeneous Architectures
    Wen H.
    Zhang W.
    [J]. Journal of Computing Science and Engineering, 2020, 16 (04) : 131 - 145
  • [9] A Hierarchical, bulk-synchronous stochastic gradient descent algorithm for deep-learning applications on GPU clusters
    Cong, Guojing
    Bhardwaj, Onkar
    [J]. 2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 818 - 821
  • [10] Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning
    Guo, Pengzhan
    Ye, Zeyang
    Xiao, Keli
    Zhu, Wei
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (10) : 5037 - 5050