Prophet: Fine-grained Load Balancing for Parallel Training of Large-scale MoE Models

被引：1

作者：

Wang, Wei ^{[1
]}

Lai, Zhiquan ^{[1
]}

Li, Shengwei ^{[1
]}

Liu, Weijie ^{[1
]}

Ge, Keshi ^{[1
]}

Liu, Yujie ^{[1
]}

Shen, Ao ^{[1
]}

Li, Dongsheng ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Comp, Natl Lab Parallel & Distributed Proc PDL, Changsha, Peoples R China

来源：

2023 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, CLUSTER | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

mixture of experts; distributed training;

D O I：

10.1109/CLUSTER52292.2023.00015

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Mixture of Expert (MoE) has received increasing attention for scaling DNN models to extra-large size with negligible increases in computation. The MoE model has achieved the highest accuracy in several domains. However, a significant load imbalance occurs in the device during the training of a MoE model, resulting in significantly reduced throughput. Previous works on load balancing either harm model convergence or suffer from high execution overhead. To address these issues, we present Prophet: a fine-grained load balancing method for parallel training of large-scale MoE models, which consists of a planner and a scheduler. Prophet planner first employs a fine-grained resource allocation method to determine the possible scenarios for the expert placement in a fine-grained manner, and then efficiently searches for a well-balanced expert placement to balance the load without introducing additional overhead. Prophet scheduler exploits the locality of the token distribution to schedule the resource allocation operations using a layer-wise fine-grained schedule strategy to hide their overhead. We conduct extensive experiments in four clusters and five representative models. The results indicate that Prophet gains up to 2.3x speedup compared to the state-of-the-art MoE frameworks including Deepspeed-MoE and FasterMoE. Additionally, Prophet achieves a load balancing enhancement of up to 12.06x when compared to FasterMoE.

引用

页码：82 / 94

页数：13

共 50 条

[31] Efficient integration of fine-grained access control in large-scale grid services
Mazzoleni, P
Crispo, B
Sivasubramanian, S
Bertino, E
2005 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING, VOL 1, PROCEEDINGS, 2005, : 77 - 84
[32] Fine-grained self-healing hardware for large-scale autonomic systems
Kumar, VV
Lach, J
14TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2003, : 707 - 712
[33] Fine-grained distributed averaging for large-scale radio interferometric measurement sets
Wei, Shou-Lin
Luo, Kai-Da
Wang, Feng
Deng, Hui
Mei, Ying
RESEARCH IN ASTRONOMY AND ASTROPHYSICS, 2021, 21 (04)
[34] A fine-grained load balancing technique for improving partition-parallel-based ontology matching approaches
Araujo, Tiago Brasileiro
Santos Pires, Carlos Eduardo
da Nobrega, Thiago Pereira
Nascimento, Dimas C.
KNOWLEDGE-BASED SYSTEMS, 2016, 111 : 17 - 26
[35] Fine-Grained HTTP Web Traffic Analysis Based on Large-Scale Mobile Datasets
Fang, Cheng
Liu, Jun
Lei, Zhenming
IEEE ACCESS, 2016, 4 : 4364 - 4373
[36] ANetQA: A Large-scale Benchmark for Fine-grained Compositional Reasoning over Untrimmed Videos
Yu, Zhou
Zheng, Lixiang
Zhao, Zhou
Wu, Fei
Fan, Jianping
Ren, Kui
Yu, Jun
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23191 - 23200
[37] DocEE: A Large-Scale and Fine-grained Benchmark for Document-level Event Extraction
Tong, Meihan
Xu, Bin
Wang, Shuai
Han, Meihuan
Cao, Yixin
Zhu, Jiangqi
Chen, Siyu
Hou, Lei
Li, Juanzi
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3970 - 3982
[38] Fine-Grained Histopathological Image Analysis via Robust Segmentation and Large-Scale Retrieval
Zhang, Xiaofan
Su, Hai
Rang, Lin
Zhang, Shaoting
2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 5361 - 5368
[39] LFETT2021: A Large-scale Fine-grained Encrypted Tunnel Traffic Dataset
Gu, Zheyuan
Gou, Gaopeng
Hou, Chengshang
Xiong, Gang
Li, Zhen
2021 IEEE 20TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2021), 2021, : 240 - 249
[40] Large-Scale Fine-Grained Bird Recognition Based on a Triplet Network and Bilinear Model
Zhao, Zhicheng
Luo, Ze
Li, Jian
Wang, Kaihua
Shi, Bingying
APPLIED SCIENCES-BASEL, 2018, 8 (10):

← 1 2 3 4 5 →