Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions

被引:0
|
作者
Liu, Ling [1 ]
Zhou, Pan [1 ]
Sun, Gang [2 ]
Chen, Xi [2 ,3 ]
Wu, Tao [4 ]
Yu, Hongfang [2 ]
Guizani, Mohsen [5 ]
机构
[1] Southwest Minzu Univ, Coll Elect Informat, Chengdu, Peoples R China
[2] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Key Lab Opt Fiber Sensing & Commun, Minist Educ, Chengdu, Peoples R China
[3] Southwest Minzu Univ, Sch Comp Sci & Engn, Chengdu, Peoples R China
[4] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu, Peoples R China
[5] Mohamed Bin Zayed Univ Artificial Intelligence MBZ, Machine Learning Dept, Abu Dhabi, U Arab Emirates
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Distributed Machine Learning (DML); Parameter Server (PS) architecture; Tree architecture; Ring architecture; Network topology; Training performance; DATA CENTER NETWORKS; ARCHITECTURE; INTERCONNECTION; DESIGN;
D O I
10.1016/j.neucom.2023.127009
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the widespread use of distributed machine learning (DML), many IT companies have established networks dedicated to DML. Different communication architectures of DML have different traffic patterns and different requirements on network performance, which is closely related to network topology. However, traditional network topologies usually pursue general goals and are agnostic to the special communication pattern of the applications. The mismatch between network topology and the applications will directly affect the training performance. Although some studies have analyzed the effect of topology on training performance, the topologies and communication architectures involved are not comprehensive, and it is still not known which topology is appropriate for which communication architecture. This survey investigates typical topologies and analyzes whether they meet the requirements of three commonly used communication architectures (i.e., Parameter Server (PS), Tree and Ring architectures) of DML. Specifically, the topology requirements of each communication architecture and two common topology requirements (i.e., high scalability and fault tolerance) for DML are studied firstly. Next, whether these topologies meet the topology requirements is analyzed. Then, this paper discusses potential technologies and approaches to construct the appropriate scheme for each topology requirement, and then presents DMLNet, a novel network topology that suits the three communication architectures. Finally, several potential directions for future research are outlined.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Privacy preserving distributed knowledge discovery: Survey and future directions
    Al-Janabi, S.T.F. (sufyantaih@ieee.org), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (04):
  • [42] RPL routing protocol over IoT: A comprehensive survey, recent advances, insights, bibliometric analysis, recommendations, and future directions
    Darabkh, Khalid A.
    Al-Akhras, Muna
    Zomot, Jumana N.
    Atiquzzaman, Mohammed
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2022, 207
  • [43] A Comprehensive Review of Machine Learning Approaches for Anomaly Detection in Smart Homes: Experimental Analysis and Future Directions
    Rahman, Md Motiur
    Gupta, Deepti
    Bhatt, Smriti
    Shokouhmand, Shiva
    Faezipour, Miad
    FUTURE INTERNET, 2024, 16 (04)
  • [44] Application of machine learning and artificial intelligence on agriculture supply chain: a comprehensive review and future research directions
    Kumari, Sneha
    Venkatesh, V. G.
    Tan, Felix Ter Chian
    Bharathi, S. Vijayakumar
    Ramasubramanian, M.
    Shi, Yangyan
    ANNALS OF OPERATIONS RESEARCH, 2023,
  • [45] A comprehensive survey on regularization strategies in machine learning
    Tian, Yingjie
    Zhang, Yuqi
    INFORMATION FUSION, 2022, 80 : 146 - 166
  • [46] Machine Learning and Radiogenomics: Lessons Learned and Future Directions
    Kang, John
    Rancati, Tiziana
    Lee, Sangkyu
    Oh, Jung Hun
    Kerns, Sarah L.
    Scott, Jacob G.
    Schwartz, Russell
    Kim, Seyoung
    Rosenstein, Barry S.
    FRONTIERS IN ONCOLOGY, 2018, 8
  • [47] Machine Learning for Smart Agriculture: A Comprehensive Survey
    Mahmood M.R.
    Matin M.A.
    Goudos S.K.
    Karagiannidis G.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (06): : 2568 - 2588
  • [48] Parallel approaches to machine learning - A comprehensive survey
    Upadhyaya, Sujatha R.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2013, 73 (03) : 284 - 292
  • [49] A Comprehensive Survey of Loss Functions in Machine Learning
    Wang Q.
    Ma Y.
    Zhao K.
    Tian Y.
    Annals of Data Science, 2022, 9 (02) : 187 - 212
  • [50] Challenges and future directions of secure federated learning: a survey
    Kaiyue Zhang
    Xuan Song
    Chenhan Zhang
    Shui Yu
    Frontiers of Computer Science, 2022, 16