Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions

被引:0
|
作者
Liu, Ling [1 ]
Zhou, Pan [1 ]
Sun, Gang [2 ]
Chen, Xi [2 ,3 ]
Wu, Tao [4 ]
Yu, Hongfang [2 ]
Guizani, Mohsen [5 ]
机构
[1] Southwest Minzu Univ, Coll Elect Informat, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Key Lab Opt Fiber Sensing & Commun, Minist Educ, Chengdu, Peoples R China
[3] Southwest Minzu Univ, Sch Comp Sci & Engn, Chengdu, Peoples R China
[4] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu, Peoples R China
[5] Mohamed Bin Zayed Univ Artificial Intelligence MBZ, Machine Learning Dept, Abu Dhabi, U Arab Emirates
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Distributed Machine Learning (DML); Parameter Server (PS) architecture; Tree architecture; Ring architecture; Network topology; Training performance; DATA CENTER NETWORKS; ARCHITECTURE; INTERCONNECTION; DESIGN;
D O I
10.1016/j.neucom.2023.127009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the widespread use of distributed machine learning (DML), many IT companies have established networks dedicated to DML. Different communication architectures of DML have different traffic patterns and different requirements on network performance, which is closely related to network topology. However, traditional network topologies usually pursue general goals and are agnostic to the special communication pattern of the applications. The mismatch between network topology and the applications will directly affect the training performance. Although some studies have analyzed the effect of topology on training performance, the topologies and communication architectures involved are not comprehensive, and it is still not known which topology is appropriate for which communication architecture. This survey investigates typical topologies and analyzes whether they meet the requirements of three commonly used communication architectures (i.e., Parameter Server (PS), Tree and Ring architectures) of DML. Specifically, the topology requirements of each communication architecture and two common topology requirements (i.e., high scalability and fault tolerance) for DML are studied firstly. Next, whether these topologies meet the topology requirements is analyzed. Then, this paper discusses potential technologies and approaches to construct the appropriate scheme for each topology requirement, and then presents DMLNet, a novel network topology that suits the three communication architectures. Finally, several potential directions for future research are outlined.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Recommendations and future directions for supervised machine learning in psychiatry
    Micah Cearns
    Tim Hahn
    Bernhard T. Baune
    Translational Psychiatry, 9
  • [2] Recommendations and future directions for supervised machine learning in psychiatry
    Cearns, Micah
    Hahn, Tim
    Baune, Bernhard T.
    TRANSLATIONAL PSYCHIATRY, 2019, 9 (1)
  • [3] A comprehensive survey on Machine Learning techniques in opportunistic networks: Advances, challenges and future directions
    Gandhi, Jay
    Narmawala, Zunnun
    PERVASIVE AND MOBILE COMPUTING, 2024, 100
  • [4] From distributed machine to distributed deep learning: a comprehensive survey
    Dehghani, Mohammad
    Yazdanparast, Zahra
    JOURNAL OF BIG DATA, 2023, 10 (01)
  • [5] From distributed machine to distributed deep learning: a comprehensive survey
    Mohammad Dehghani
    Zahra Yazdanparast
    Journal of Big Data, 10
  • [6] A comprehensive survey and review of machine learning techniques in document processing : Industry applications and future directions
    Tiwari, Manisha
    Aital, Padmanabhan
    Joshi, Padmaja
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (04): : 1177 - 1188
  • [7] Machine learning for autonomous vehicle's trajectory prediction: A comprehensive survey, challenges, and future research directions
    Bharilya, Vibha
    Kumar, Neetesh
    VEHICULAR COMMUNICATIONS, 2024, 46
  • [8] Machine-Learning-Based Positioning: A Survey and Future Directions
    Li, Ziwei
    Xu, Ke
    Wang, Haiyang
    Zhao, Yi
    Wang, Xiaoliang
    Shen, Meng
    IEEE NETWORK, 2019, 33 (03): : 96 - 101
  • [9] Distributed Machine Learning in Edge Computing: Challenges, Solutions and Future Directions
    Tu, Jingke
    Yang, Lei
    Cao, Jiannong
    ACM COMPUTING SURVEYS, 2025, 57 (05)
  • [10] A Comprehensive Survey on Beamforming and Antenna Selection in MIMO Systems using Deep Learning and Machine Learning Techniques with Future Research Directions
    Kavitha, K. R.
    Sivakumar, T.
    2024 2ND WORLD CONFERENCE ON COMMUNICATION & COMPUTING, WCONF 2024, 2024,