Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions

被引:0
|
作者
Liu, Ling [1 ]
Zhou, Pan [1 ]
Sun, Gang [2 ]
Chen, Xi [2 ,3 ]
Wu, Tao [4 ]
Yu, Hongfang [2 ]
Guizani, Mohsen [5 ]
机构
[1] Southwest Minzu Univ, Coll Elect Informat, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Key Lab Opt Fiber Sensing & Commun, Minist Educ, Chengdu, Peoples R China
[3] Southwest Minzu Univ, Sch Comp Sci & Engn, Chengdu, Peoples R China
[4] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu, Peoples R China
[5] Mohamed Bin Zayed Univ Artificial Intelligence MBZ, Machine Learning Dept, Abu Dhabi, U Arab Emirates
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Distributed Machine Learning (DML); Parameter Server (PS) architecture; Tree architecture; Ring architecture; Network topology; Training performance; DATA CENTER NETWORKS; ARCHITECTURE; INTERCONNECTION; DESIGN;
D O I
10.1016/j.neucom.2023.127009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the widespread use of distributed machine learning (DML), many IT companies have established networks dedicated to DML. Different communication architectures of DML have different traffic patterns and different requirements on network performance, which is closely related to network topology. However, traditional network topologies usually pursue general goals and are agnostic to the special communication pattern of the applications. The mismatch between network topology and the applications will directly affect the training performance. Although some studies have analyzed the effect of topology on training performance, the topologies and communication architectures involved are not comprehensive, and it is still not known which topology is appropriate for which communication architecture. This survey investigates typical topologies and analyzes whether they meet the requirements of three commonly used communication architectures (i.e., Parameter Server (PS), Tree and Ring architectures) of DML. Specifically, the topology requirements of each communication architecture and two common topology requirements (i.e., high scalability and fault tolerance) for DML are studied firstly. Next, whether these topologies meet the topology requirements is analyzed. Then, this paper discusses potential technologies and approaches to construct the appropriate scheme for each topology requirement, and then presents DMLNet, a novel network topology that suits the three communication architectures. Finally, several potential directions for future research are outlined.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] On the role of machine learning in satellite internet of things: A survey of techniques, challenges, and future directions
    Choquenaira-Florez, Alexander Y.
    Fraire, Juan A.
    Pasandi, Hannah B.
    Rivano, Herve
    COMPUTER NETWORKS, 2025, 259
  • [22] A survey of machine learning applications in advanced transportation systems: Trends, techniques, and future directions
    Zhang, Yuzhong
    Zhang, Songyang
    Dinavahi, Venkata
    ETRANSPORTATION, 2025, 24
  • [23] Ontologies and Machine Learning Models to Enhance Health Informatics: A Survey, Challenges and Future Directions
    Department of Computer Science, Mohamed El Bachir El Ibrahimi University, Bordj Bou Arreridj, Algeria
    不详
    不详
    IAENG Int. J. Appl. Math., 2025, 55 (03): : 475 - 499
  • [24] Machine learning in cybersecurity: a comprehensive survey
    Dasgupta, Dipankar
    Akhtar, Zahid
    Sen, Sajib
    JOURNAL OF DEFENSE MODELING AND SIMULATION-APPLICATIONS METHODOLOGY TECHNOLOGY-JDMS, 2022, 19 (01): : 57 - 106
  • [25] A survey of methods for distributed machine learning
    Peteiro-Barral, Diego
    Guijarro-Berdinas, Bertha
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2013, 2 (01) : 1 - 11
  • [26] A comprehensive overview of microbiome data in the light of machine learning applications: categorization, accessibility, and future directions
    Kumar, Bablu
    Lorusso, Erika
    Fosso, Bruno
    Pesole, Graziano
    FRONTIERS IN MICROBIOLOGY, 2024, 15
  • [27] Multi Task Learning: A Survey and Future Directions
    Lee, Taeho
    Seok, Junhee
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 232 - 235
  • [28] Arabic Machine Translation: A Survey With Challenges and Future Directions
    Zakraoui, Jezia
    Saleh, Moutaz
    Al-Maadeed, Somaya
    Alja'am, Jihad Mohamed
    IEEE ACCESS, 2021, 9 : 161445 - 161468
  • [29] A Comprehensive Survey on Web Recommendations Systems with Special Focus on Filtering Techniques and Usage of Machine Learning
    Asha, K. N.
    Rajkumar, R.
    COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING, 2020, 1108 : 1009 - 1022
  • [30] Decentralized Machine Learning Training: A Survey on Synchronization, Consolidation, and Topologies
    Khan, Qazi Waqas
    Khan, Anam Nawaz
    Rizwan, Atif
    Ahmad, Rashid
    Khan, Salabat
    Kim, Do-Hyeun
    IEEE ACCESS, 2023, 11 : 68031 - 68050