Communication efficient distributed learning of neural networks in Big Data environments using Spark

被引：1

作者：

Alkhoury, Fouad ^{[1
]}

Wegener, Dennis ^{[2
]}

Sylla, Karl-Heinz ^{[2
]}

Mock, Michael ^{[2
]}

机构：

[1] Univ Bonn, Bonn, Germany

[2] Fraunhofer IAIS, St Augustin, Germany

来源：

2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2021年

关键词：

Federated Learning; Distributed Learning of Deep Neural Networks; Big Data Systems; Spark; Data Science Systems; Horizontal Scalability;

D O I：

10.1109/BigData52589.2021.9671506

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Distributed (or federated) training of neural networks is an important approach to reduce the training time significantly. Previous experiments on communication efficient distributed learning have shown that model averaging, even if provably correct only in case of convex loss functions, is also working for the training of neural networks in some cases, however restricted to simple examples with relatively small standard data sets. In this paper, we investigate to what extent distributed communication efficient learning scales to huge data sets and complex, deep neural networks. We show how to integrate communication efficient distributed learning into the big data environment Spark and apply it to a complex real-world scenario, namely image segmentation on a large automotive data set (A2D2). We present evidence based results that the distributed approach scales successfully with increasing number of computing nodes in the case of fully convolutional networks.

引用

页码：3871 / 3877

页数：7

共 50 条

[1] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
Hai, Ameen Abdel
Forouraghi, Babak
[J]. BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
[2] Reducing communication for distributed learning in neural networks
Auer, P
Burgsteiner, H
Maass, W
[J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2002, 2002, 2415 : 123 - 128
[3] A communication efficient distributed learning framework for smart environments
Valerio, Lorenzo
Passarella, Andrea
Conti, Marco
[J]. PERVASIVE AND MOBILE COMPUTING, 2017, 41 : 46 - 68
[4] Distributed Neural Networks for Missing Big Data Imputation
Petrozziello, Alessio
Jordanov, Ivan
Sommeregger, Christian
[J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 131 - 138
[5] Optimal Partitioning of Distributed Neural Networks for Various Communication Environments
Jeong, Jonghun
Yang, Hoeseok
[J]. 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021), 2021, : 269 - 272
[6] Spark Based Distributed Deep Learning Framework For Big Data Applications
Khumoyun, Akhmedov
Cui, Yun
Hanku, Lee
[J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND COMMUNICATIONS TECHNOLOGIES (ICISCT), 2016,
[7] Optimal and Efficient Distributed Online Learning for Big Data
Sayin, Muhammed O.
Vanli, N. Denizcan
Delibalta, Ibrahim
Kozat, Suleyman S.
[J]. 2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 126 - 133
[8] Distributed Learning of Neural Networks with One Round of Communication
Izbicki, Mike
Shelton, Christian R.
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT I, 2020, 1167 : 293 - 300
[9] Energy Efficient Neural Networks for Big Data Analytics
Wang, Yu
Li, Boxun
Luo, Rong
Chen, Yiran
Xu, Ningyi
Yang, Huazhong
[J]. 2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
[10] An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments
Lim, Jongtae
Kim, Byounghoon
Lee, Hyeonbyeong
Choi, Dojin
Bok, Kyoungsoo
Yoo, Jaesoo
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (01):

← 1 2 3 4 5 →