Towards an optimized distributed deep learning framework for a heterogeneous multi-GPU cluster

被引:17
|
作者
Kim, Youngrang [1 ]
Choi, Hyeonseong [1 ]
Lee, Jaehwan [1 ]
Kim, Jik-Soo [2 ]
Jei, Hyunseung [3 ]
Roh, Hongchan [3 ]
机构
[1] Korea Aerosp Univ, Goyang Si, South Korea
[2] Myongji Univ, Yongin, South Korea
[3] SK Telecom ML Infra Lab, Seongnam Si, South Korea
基金
新加坡国家研究基金会;
关键词
Data parallel; Distributed deep learning; Heterogeneous cluster; Large-scale deep learning;
D O I
10.1007/s10586-020-03144-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel "Distributed Deep Learning Framework" for aheterogeneousmulti-GPU cluster that can effectively improve overall resource utilization without sacrificing training accuracy. Specifically, we employ a hybrid aggregation approach using a parameter-server and all-reduce schemes in order to address potential performance degradation problems in running deep learning applications on a heterogeneous computing system. In addition, we design and implement an asynchronous large mini-batch training mechanism to maintain training accuracy for asynchronous data-paralleled deep learning processing with enhanced collective communication capability based on MPI. We successfully implement our proposed framework on TensorFlow and perform extensive experiments in both of homogeneous and heterogeneous computing systems. Evaluation results show that our proposed framework can improve computing performance by decreasing I/O bottlenecks, and effectively increasing the resource utilization in the heterogeneous multi-GPU cluster.
引用
收藏
页码:2287 / 2300
页数:14
相关论文
共 50 条
  • [41] Serving Heterogeneous Machine Learning Models on Multi-GPU Servers with Spatio-Temporal Sharing
    Choi, Seungbeom
    Lee, Sunho
    Kim, Yeonjae
    Park, Jongse
    Kwon, Youngjin
    Huh, Jaehyuk
    PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 199 - 215
  • [42] Multi-CPU/Multi-GPU Based Framework for Multimedia Processing
    Mahmoudi, Sidi Ahmed
    Manneback, Pierre
    COMPUTER SCIENCE AND ITS APPLICATIONS, CIIA 2015, 2015, 456 : 54 - 65
  • [43] Distributed Multi-GPU Community Detection on Exascale Computing Platforms
    Sattar, Naw Safrin
    Lu, Hao
    Wang, Feiyi
    Halappanavar, Mahantesh
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 815 - 824
  • [44] Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures
    Angeles Navarro
    Antonio Vilches
    Francisco Corbera
    Rafael Asenjo
    The Journal of Supercomputing, 2014, 70 : 756 - 771
  • [45] Hierarchical Heterogeneous Cluster Systems for Scalable Distributed Deep Learning
    Wang, Yibo
    Geng, Tongsheng
    Silva, Ericson
    Gaudiot, Jean-Luc
    2024 IEEE 27TH INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING, ISORC 2024, 2024,
  • [46] Multi-GPU performance of incompressible flow computation by lattice Boltzmann method on GPU cluster
    Xian, Wang
    Takayuki, Aoki
    PARALLEL COMPUTING, 2011, 37 (09) : 521 - 535
  • [47] PARTANS: An Autotuning Framework for Stencil Computation on Multi-GPU Systems
    Lutz, Thibaut
    Fensch, Christian
    Cole, Murray
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2013, 9 (04)
  • [48] Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures
    Navarro, Angeles
    Vilches, Antonio
    Corbera, Francisco
    Asenjo, Rafael
    JOURNAL OF SUPERCOMPUTING, 2014, 70 (02): : 756 - 771
  • [49] A Multi-GPU Framework for In-Memory Text Data Analytics
    Chong, Poh Kit
    Karuppiah, Ettikan K.
    Yong, Keh Kok
    2013 IEEE 27TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA), 2013, : 1411 - 1416
  • [50] Fast STA Graph Partitioning Framework for Multi-GPU Acceleration
    Guo, Guannan
    Huang, Tsung-Wei
    Wong, Martin
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,