Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models

被引:1
|
作者
Wang, Wei [1 ]
Lai, Zhiquan [1 ]
Li, Shengwei [1 ]
Liu, Weijie [1 ]
Ge, Keshi [1 ]
Liu, Yujie [1 ]
Shen, Ao [1 ]
Li, Dongsheng [1 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Natl Lab Parallel & Distributed Proc PDL, Changsha, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
mixture of experts; distributed training;
D O I
10.1109/CLUSTER52292.2023.00015
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Mixture of Expert (MoE) has received increasing attention for scaling DNN models to extra-large size with negligible increases in computation. The MoE model has achieved the highest accuracy in several domains. However, a significant load imbalance occurs in the device during the training of a MoE model, resulting in significantly reduced throughput. Previous works on load balancing either harm model convergence or suffer from high execution overhead. To address these issues, we present Prophet: a fine-grained load balancing method for parallel training of large-scale MoE models, which consists of a planner and a scheduler. Prophet planner first employs a fine-grained resource allocation method to determine the possible scenarios for the expert placement in a fine-grained manner, and then efficiently searches for a well-balanced expert placement to balance the load without introducing additional overhead. Prophet scheduler exploits the locality of the token distribution to schedule the resource allocation operations using a layer-wise fine-grained schedule strategy to hide their overhead. We conduct extensive experiments in four clusters and five representative models. The results indicate that Prophet gains up to 2.3x speedup compared to the state-of-the-art MoE frameworks including Deepspeed-MoE and FasterMoE. Additionally, Prophet achieves a load balancing enhancement of up to 12.06x when compared to FasterMoE.
引用
收藏
页码:82 / 94
页数:13
相关论文
共 50 条
  • [1] Fine-Grained Parallel Optimization of Large-Scale Data for PMVS Algorithm
    Liu J.
    Li Y.
    Jiang Z.
    Deng J.
    Sui H.
    Pan J.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2019, 44 (04): : 608 - 616
  • [2] ParaFlow: Fine-grained parallel SDN controller for large-scale networks
    Song, Ping
    Liu, Yi
    Liu, Chi
    Qian, Depei
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2017, 87 : 46 - 59
  • [3] Benchmarking Large-Scale Fine-Grained Categorization
    Angelova, Anelia
    Long, Philip M.
    2014 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2014, : 532 - 539
  • [4] A Fine-Grained Large-Scale NAT Detection Method
    Yan, Bin
    Huang, Liang
    Gou, Gaopeng
    Guo, Yuanbo
    Bao, Yibao
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURETECH & MUE, 2016, 393 : 493 - 499
  • [5] Large-scale instability of a fine-grained turbulent jet
    Chen, KP
    Crighton, DG
    EUROPEAN JOURNAL OF MECHANICS B-FLUIDS, 1999, 18 (01) : 13 - 34
  • [6] Fine-grained Transmission Optimization of Large-scale WebVR Scenes
    Yin, Changqing
    Chen, Zhaohui
    Hu, Yonghao
    Yu, Kexin
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2018, : 209 - 214
  • [7] Birdsnap: Large-scale Fine-grained Visual Categorization of Birds
    Berg, Thomas
    Liu, Jiongxin
    Lee, Seung Woo
    Alexander, Michelle L.
    Jacobs, David W.
    Belhumeur, Peter N.
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 2019 - 2026
  • [8] A Large-Scale Car Dataset for Fine-Grained Categorization and Verification
    Yang, Linjie
    Luo, Ping
    Loy, Chen Change
    Tang, Xiaoou
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 3973 - 3981
  • [9] TransformingWikipedia into a Large-Scale Fine-Grained Entity Type Corpus
    Ghaddar, Abbas
    Langlais, Philippe
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4413 - 4420
  • [10] Fine-Grained Local Dynamic Load Balancing in PDES
    Linden, Jonatan
    Bauer, Pavol
    Engblom, Stefan
    Jonsson, Bengt
    SIGSIM-PADS'18: PROCEEDINGS OF THE 2018 ACM SIGSIM CONFERENCE ON PRINCIPLES OF ADVANCED DISCRETE SIMULATION, 2018, : 201 - 212