Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster

被引:17
|
作者
Kim, Youngrang [1 ]
Choi, Hyeonseong [1 ]
Lee, Jaehwan [1 ]
Kim, Jik-Soo [2 ]
Jei, Hyunseung [3 ]
Roh, Hongchan [3 ]
机构
[1] Korea Aerosp Univ, Goyang Si, South Korea
[2] Myongji Univ, Yongin, South Korea
[3] SK Telecom ML Infra Lab, Seongnam Si, South Korea
基金
新加坡国家研究基金会;
关键词
Data parallel; Distributed deep learning; Heterogeneous cluster; Large-scale deep learning;
D O I
10.1007/s10586-020-03144-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel "Distributed Deep Learning Framework" for aheterogeneousmulti-GPU cluster that can effectively improve overall resource utilization without sacrificing training accuracy. Specifically, we employ a hybrid aggregation approach using a parameter-server and all-reduce schemes in order to address potential performance degradation problems in running deep learning applications on a heterogeneous computing system. In addition, we design and implement an asynchronous large mini-batch training mechanism to maintain training accuracy for asynchronous data-paralleled deep learning processing with enhanced collective communication capability based on MPI. We successfully implement our proposed framework on TensorFlow and perform extensive experiments in both of homogeneous and heterogeneous computing systems. Evaluation results show that our proposed framework can improve computing performance by decreasing I/O bottlenecks, and effectively increasing the resource utilization in the heterogeneous multi-GPU cluster.
引用
收藏
页码:2287 / 2300
页数:14
相关论文
共 50 条
  • [11] Tiresias: A GPU Cluster Manager for Distributed Deep Learning
    Gu, Juncheng
    Chowdhury, Mosharaf
    Shin, Kang G.
    Zhu, Yibo
    Jeon, Myeongjae
    Qian, Junjie
    Liu, Hongqiang
    Guo, Chuanxiong
    PROCEEDINGS OF THE 16TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, 2019, : 485 - 500
  • [12] Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration
    Kim, Youngrang
    Lee, Jaehwan
    Kim, Jik-Soo
    Jei, Hyunseung
    Roh, Hongchan
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (03): : 2193 - 2204
  • [13] Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration
    Youngrang Kim
    Jaehwan Lee
    Jik-Soo Kim
    Hyunseung Jei
    Hongchan Roh
    Cluster Computing, 2020, 23 : 2193 - 2204
  • [14] Distributed texture memory in a Multi-GPU environment
    Moerschell, Adam
    Owens, John D.
    COMPUTER GRAPHICS FORUM, 2008, 27 (01) : 130 - 151
  • [15] AEML: An Acceleration Engine for Multi-GPU Load-Balancing in Distributed Heterogeneous Environment
    Tang, Zhuo
    Du, Lifan
    Zhang, Xuedong
    Yang, Li
    Li, Kenli
    IEEE TRANSACTIONS ON COMPUTERS, 2022, 71 (06) : 1344 - 1357
  • [16] Towards a Multi-GPU Implementation of a Seismic Application
    Rigon, Pedro H. C.
    Schussler, Brenda S.
    Padoin, Edson L.
    Lorenzon, Arthur F.
    Carissimi, Alexandre
    Navaux, Philippe O. A.
    HIGH PERFORMANCE COMPUTING, CARLA 2023, 2024, 1887 : 146 - 159
  • [17] Accelerating MapReduce framework on multi-GPU systems
    Jiang, Hai
    Chen, Yi
    Qiao, Zhi
    Li, Kuan-Ching
    Ro, WonWoo
    Gaudiot, Jean-Luc
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2014, 17 (02): : 293 - 301
  • [18] Accelerating MapReduce framework on multi-GPU systems
    Hai Jiang
    Yi Chen
    Zhi Qiao
    Kuan-Ching Li
    WonWoo Ro
    Jean-Luc Gaudiot
    Cluster Computing, 2014, 17 : 293 - 301
  • [19] A Novel Heterogeneous Multi-GPU Parallel Rendering Framework in UE4 Scene
    Zhang, Siyu
    Wang, Yanfeng
    Guo, Jianjun
    INTERNATIONAL JOURNAL OF MULTIPHYSICS, 2024, 18 (02) : 133 - 144
  • [20] Task-Based Conjugate Gradient: From Multi-GPU Towards Heterogeneous Architectures
    Agullo, E.
    Giraud, L.
    Guermouche, A.
    Nakov, S.
    Roman, J.
    EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 69 - 82