LEARNAE: Distributed and Resilient Deep Neural Network Training for Heterogeneous Peer to Peer Topologies

被引:3
|
作者
Nikolaidis, Spyridon [1 ]
Refanidis, Ioannis [1 ]
机构
[1] Univ Macedonia, Thessaloniki 54636, Greece
关键词
Decentralized neural network training; Distributed asynchronous stochastic gradient decent; Model averaging; Peer-to-Peer topologies; Distributed Ledger Technology; IPFS; IOTA;
D O I
10.1007/978-3-030-20257-6_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LEARNAE is a framework proposal for decentralized training of Deep Neural Networks (DNN). The main priority of LEARNAE is to maintain a fully distributed architecture, where no participant has any kind of coordinating role. This solid peer-to-peer concept covers all aspects: Underlying network protocols, data acquiring/distribution and model training. The result is a resilient DNN training system with no single point of failure. LEARNAE focuses on use cases where infrastructure heterogeneity and network unreliability result to an always changing environment of commodity-hardware nodes. In order to achieve this level of decentralization, new technologies had to be utilized. The main pillars of this implementation are the ongoing projects of IPFS and IOTA. IPFS is a platform for a purely decentralized filesystem, where each node contributes local data storage. IOTA aims to be the networking infrastructure of the upcoming IoT reality. On top of these, we propose a management algorithm for training a DNN model collaboratively, by optimal exchange of data and model weights, always using distribution-friendly gossip protocols.
引用
收藏
页码:286 / 298
页数:13
相关论文
共 50 条
  • [1] Secure decentralized peer-to-peer training of deep neural networks based on distributed ledger technology
    Fadaeddini, Amin
    Majidi, Babak
    Eshghi, Mohammad
    [J]. JOURNAL OF SUPERCOMPUTING, 2020, 76 (12): : 10354 - 10368
  • [2] Secure decentralized peer-to-peer training of deep neural networks based on distributed ledger technology
    Amin Fadaeddini
    Babak Majidi
    Mohammad Eshghi
    [J]. The Journal of Supercomputing, 2020, 76 : 10354 - 10368
  • [3] Building resilient low-diameter peer-to-peer topologies
    Wouhaybi, Rita H.
    Campbell, Andrew T.
    [J]. COMPUTER NETWORKS, 2008, 52 (05) : 1019 - 1039
  • [4] Phenix: Supporting resilient low-diameter peer-to-peer topologies
    Wouhaybi, RH
    Campbell, AT
    [J]. IEEE INFOCOM 2004: THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-4, PROCEEDINGS, 2004, : 108 - 119
  • [5] Network Coding for Resilient Peer-to-Peer Networks
    Hu, D. Y.
    Wang, M. Z.
    Lau, F. C. M.
    Peng, Q. C.
    [J]. 2009 7TH INTERNATIONAL WORKSHOP ON THE DESIGN OF RELIABLE COMMUNICATION NETWORKS (DRCN 2009), 2009, : 352 - +
  • [6] A Heterogeneous Peer-to-Peer Network Testbed
    Li, Victor O. K.
    Cui, Li
    Liu, Qiang
    Yang, Guang-Hua
    Zhao, Ze
    Leung, Ka-Cheong
    [J]. 2009 FIRST INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS, 2009, : 46 - +
  • [7] A Framework for Distributed Deep Neural Network Training with Heterogeneous Computing Platforms
    Gu, Bontak
    Kong, Joonho
    Munir, Arslan
    Kim, Young Geun
    [J]. 2019 IEEE 25TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2019, : 430 - 437
  • [8] A Sybil-Resilient Peer-to-Peer Network Protocol
    Xu Xiang
    Zhou Hangxia
    [J]. WISM: 2009 INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, : 682 - 685
  • [9] DHitchhiker Codes in Heterogeneous Peer-to-Peer Distributed Storage
    Hu, Jin-Ping
    Li, Gui-Yang
    Zhou, Yue
    Li, Hui
    Jiang, Xiao-Yu
    Han, Hong-Yu
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2021, 49 (06): : 1151 - 1158
  • [10] A Byzantine-Resilient Distributed Peer-to-Peer Energy Management Approach
    Chang, Xinyue
    Xu, Yinliang
    Guo, Qinglai
    Sun, Hongbin
    Chan, Wai Kin
    [J]. IEEE TRANSACTIONS ON SMART GRID, 2023, 14 (01) : 623 - 634