Communication efficient distributed learning of neural networks in Big Data environments using Spark

被引:1
|
作者
Alkhoury, Fouad [1 ]
Wegener, Dennis [2 ]
Sylla, Karl-Heinz [2 ]
Mock, Michael [2 ]
机构
[1] Univ Bonn, Bonn, Germany
[2] Fraunhofer IAIS, St Augustin, Germany
关键词
Federated Learning; Distributed Learning of Deep Neural Networks; Big Data Systems; Spark; Data Science Systems; Horizontal Scalability;
D O I
10.1109/BigData52589.2021.9671506
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed (or federated) training of neural networks is an important approach to reduce the training time significantly. Previous experiments on communication efficient distributed learning have shown that model averaging, even if provably correct only in case of convex loss functions, is also working for the training of neural networks in some cases, however restricted to simple examples with relatively small standard data sets. In this paper, we investigate to what extent distributed communication efficient learning scales to huge data sets and complex, deep neural networks. We show how to integrate communication efficient distributed learning into the big data environment Spark and apply it to a complex real-world scenario, namely image segmentation on a large automotive data set (A2D2). We present evidence based results that the distributed approach scales successfully with increasing number of computing nodes in the case of fully convolutional networks.
引用
收藏
页码:3871 / 3877
页数:7
相关论文
共 50 条
  • [1] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    [J]. BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [2] Reducing communication for distributed learning in neural networks
    Auer, P
    Burgsteiner, H
    Maass, W
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 123 - 128
  • [3] A communication efficient distributed learning framework for smart environments
    Valerio, Lorenzo
    Passarella, Andrea
    Conti, Marco
    [J]. PERVASIVE AND MOBILE COMPUTING, 2017, 41 : 46 - 68
  • [4] Distributed Neural Networks for Missing Big Data Imputation
    Petrozziello, Alessio
    Jordanov, Ivan
    Sommeregger, Christian
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 131 - 138
  • [5] Optimal Partitioning of Distributed Neural Networks for Various Communication Environments
    Jeong, Jonghun
    Yang, Hoeseok
    [J]. 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 269 - 272
  • [6] Spark Based Distributed Deep Learning Framework For Big Data Applications
    Khumoyun, Akhmedov
    Cui, Yun
    Hanku, Lee
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT), 2016,
  • [7] Optimal and Efficient Distributed Online Learning for Big Data
    Sayin, Muhammed O.
    Vanli, N. Denizcan
    Delibalta, Ibrahim
    Kozat, Suleyman S.
    [J]. 2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 126 - 133
  • [8] Distributed Learning of Neural Networks with One Round of Communication
    Izbicki, Mike
    Shelton, Christian R.
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I, 2020, 1167 : 293 - 300
  • [9] Energy Efficient Neural Networks for Big Data Analytics
    Wang, Yu
    Li, Boxun
    Luo, Rong
    Chen, Yiran
    Xu, Ningyi
    Yang, Huazhong
    [J]. 2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
  • [10] An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments
    Lim, Jongtae
    Kim, Byounghoon
    Lee, Hyeonbyeong
    Choi, Dojin
    Bok, Kyoungsoo
    Yoo, Jaesoo
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (01):