Accelerating Training for Distributed Deep Neural Networks in MapReduce

被引:0
|
作者
Xu, Jie [1 ]
Wang, Jingyu [1 ]
Qi, Qi [1 ]
Sun, Haifeng [1 ]
Liao, Jianxin [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China
来源
WEB SERVICES - ICWS 2018 | 2018年 / 10966卷
基金
中国国家自然科学基金;
关键词
Deep Neural Networks; Parallel training; MapReduce; Data transmission; Synchronization; DATA LOCALITY; PARALLEL;
D O I
10.1007/978-3-319-94289-6_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Parallel training is prevailing in Deep Neural Networks (DNN) to reduce training time. The training data sets and layered training processes of DNN are assigned to multiple Graphics Processing Units (GPUs) in parallel training. But there are some obstacles to deploy parallel training in GPU cloud services. DNN has a tight-dependent layering structure where the next layer feeds on the output of its former layer. It is unavoidable to transmit big output data between separated layered training processes. Since cloud computing offers separated storage services and computing services, data transmission through network harms the performance in training time. Thus parallel training leads to an inefficient training process in GPU cloud environment. In this paper, we construct a distributed DNN training architecture to implement parallel training for DNN in MapReduce. The architecture assigns GPU cloud resources as a web service. We also address the concern of data transmission by proposing a distributed DNN scheduler to accelerate the training time. The scheduler makes use of minimum cost flows algorithm to assign GPU resources, which considers data locality and synchronization into minimizing training time. Compared with original schedulers, experimental results reveal that distributed DNN scheduler decreases the training time by 50% with least data transmission and synchronizing parallel training.
引用
收藏
页码:181 / 195
页数:15
相关论文
共 50 条
  • [21] AccelAT: A Framework for Accelerating the Adversarial Training of Deep Neural Networks Through Accuracy Gradient
    Nikfam, Farzad
    Marchisio, Alberto
    Martina, Maurizio
    Shafique, Muhammad
    [J]. IEEE ACCESS, 2022, 10 : 108997 - 109007
  • [22] Accelerating Deep Neural Networks implementation: A survey
    Dhouibi, Meriam
    Ben Salem, Ahmed Karim
    Saidi, Afef
    Ben Saoud, Slim
    [J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2021, 15 (02): : 79 - 96
  • [23] Accelerating Sparse Deep Neural Networks on FPGAs
    Huang, Sitao
    Pearson, Carl
    Nagi, Rakesh
    Xiong, Jinjun
    Chen, Deming
    Hwu, Wen-mei
    [J]. 2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [24] Accelerating Deep Neural Networks Using FPGA
    Adel, Esraa
    Magdy, Rana
    Mohamed, Sara
    Mamdouh, Mona
    El Mandouh, Eman
    Mostafa, Hassan
    [J]. 2018 30TH INTERNATIONAL CONFERENCE ON MICROELECTRONICS (ICM), 2018, : 176 - 179
  • [25] Accelerating Data Loading in Deep Neural Network Training
    Yang, Chih-Chieh
    Cong, Guojing
    [J]. 2019 IEEE 26TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC), 2019, : 235 - 245
  • [26] A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks
    Barrachina, Sergio
    Castello, Adrian
    Catalan, Mar
    Dolz, Manuel F.
    Mestre, Jose, I
    [J]. 2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2021, : 730 - 739
  • [27] DLB: A Dynamic Load Balance Strategy for Distributed Training of Deep Neural Networks
    Ye, Qing
    Zhou, Yuhao
    Shi, Mingjia
    Sun, Yanan
    Lv, Jiancheng
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2023, 7 (04): : 1217 - 1227
  • [28] Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
    Keuper, Janis
    Preundt, Franz-Josef
    [J]. PROCEEDINGS OF 2016 2ND WORKSHOP ON MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC), 2016, : 19 - 26
  • [29] Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning
    Zhang, Lin
    Shi, Shaohuai
    Wang, Wei
    Li, Bo
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2023, 11 (03) : 2365 - 2378
  • [30] Model-Aware Parallelization Strategy for Deep Neural Networks' Distributed Training
    Yang, Zhaoyi
    Dong, Fang
    [J]. 2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 61 - 66