Byzantine fault tolerance in distributed machine learning: a survey

被引:0
|
作者
Bouhata, Djamila [1 ,2 ]
Moumen, Hamouma [1 ,2 ]
Mazari, Jocelyn Ahmed [3 ,4 ]
Bounceur, Ahcene [5 ]
机构
[1] Univ Batna, Comp Sci Dept, 2 53 Constantine Rd, Batna 05078, Algeria
[2] Lab Applicat Math Comp & Elect, Comp Sci Dept, Batna, Algeria
[3] Sorbonne Univ, CNRS, ISIR, Paris, France
[4] Extrality, Paris, France
[5] Univ Sharjah, Informat Syst Dept, Sharjah, U Arab Emirates
关键词
Byzantine fault tolerance; distributed machine learning; stochastic gradient descent; communication; optimisation; SUBGRADIENT METHODS; COORDINATE DESCENT; GRADIENT DESCENT; AGREEMENT; GENERALS;
D O I
10.1080/0952813X.2024.2391778
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Byzantine Fault Tolerance (BFT) is crucial for ensuring the resilience of Distributed Machine Learning (DML) systems during training under adversarial conditions. Among the rising corpus of research on BFT in DML, there is no comprehensive classification of techniques or broad analysis of different approaches. This paper provides an in-depth survey of recent advancements in BFT for DML, with a focus on first-order optimisation methods, particularly, the popular one Stochastic Gradient Descent (SGD) during the training phase. We offer a novel classification of BFT approaches based on characteristics such as the communication process, optimisation method, and topology setting. This classification aims to enhance the understanding of various BFT methods and guide future research in addressing open challenges in the field. This work provides the foundations for developing robust BFT systems, using a variety of optimisation methods to strengthen resilience.
引用
下载
收藏
页数:59
相关论文
共 50 条
  • [1] Genuinely distributed Byzantine machine learning
    El-Mhamdi, El-Mahdi
    Guerraoui, Rachid
    Guirguis, Arsany
    Hoang, Le-Nguyen
    Rouault, Sebastien
    DISTRIBUTED COMPUTING, 2022, 35 (04) : 305 - 331
  • [2] Genuinely distributed Byzantine machine learning
    El-Mahdi El-Mhamdi
    Rachid Guerraoui
    Arsany Guirguis
    Lê-Nguyên Hoang
    Sébastien Rouault
    Distributed Computing, 2022, 35 : 305 - 331
  • [3] A Study on Byzantine Fault Tolerance Methods in Distributed Networks
    Nasreen, M. A.
    Ganesh, Amal
    Sunitha, C.
    FOURTH INTERNATIONAL CONFERENCE ON RECENT TRENDS IN COMPUTER SCIENCE & ENGINEERING (ICRTCSE 2016), 2016, 87 : 50 - 54
  • [4] Approximate Byzantine Fault-Tolerance in Distributed Optimization
    Liu, Shuo
    Gupta, Nirupam
    Vaidya, Nitin H.
    PROCEEDINGS OF THE 2021 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC '21), 2021, : 379 - 389
  • [5] Fault Tolerance in Distributed Systems: A Survey
    Ledmi, Abdeldjalil
    Bendjenna, Hakim
    Hemam, Sofiane Mounine
    2018 3RD INTERNATIONAL CONFERENCE ON PATTERN ANALYSIS AND INTELLIGENT SYSTEMS (PAIS), 2018, : 235 - 239
  • [6] Exact Regenerating Codes for Byzantine Fault Tolerance in Distributed Storage
    Han, Yunghsiang S.
    Zheng, Rong
    Mow, Wai Ho
    2012 PROCEEDINGS IEEE INFOCOM, 2012, : 2498 - 2506
  • [7] Blockchain based Distributed Consensus for Byzantine Fault Tolerance in PMU Network
    Iyer, Sreerag
    Thakur, Snehal
    Dixit, Mihirraj
    Agrawal, Ashish
    Katkam, Rajneesh
    Kazi, Faruk
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [8] A Survey on Distributed Machine Learning
    Verbraeken, Joost
    Wolting, Matthijs
    Katzy, Jonathan
    Kloppenburg, Jeroen
    Verbelen, Tim
    Rellermeyer, Jan S.
    ACM COMPUTING SURVEYS, 2020, 53 (02)
  • [9] Trebiz: Byzantine Fault Tolerance with Byzantine Merchants
    Dai, Xiaohai
    Huang, Liping
    Xiao, Jiang
    Zhang, Zhaonan
    Xie, Xia
    Jin, Hai
    PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 923 - 935
  • [10] Byzantine Fault-Tolerant Distributed Machine Learning with Norm-Based Comparative Gradient Elimination
    Gupta, Nirupam
    Liu, Shuo
    Vaidya, Nitin
    51ST ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN-W 2021), 2021, : 175 - 181