Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

被引：0

作者：

Zhang, Weigang ^{[1
,2
]}

Zhou, Biyu ^{[1
,2
]}

Wu, Xing ^{[1
,2
]}

Gao, Chaochen ^{[1
,2
]}

Liu, Zhibing ^{[1
,2
]}

Tang, Xuehai ^{[1
,2
]}

Li, Ruixuan ^{[1
,2
]}

Han, Jizhong ^{[1
,2
]}

Hu, Songlin ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

来源：

EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024 | 2024年 / 14802卷

关键词：

Hybrid Parallelism; Large Language Models; Distributed Training;

D O I：

10.1007/978-3-031-69766-1_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hybrid parallelism is popular in training large language models (LLMs). However, existing efforts have focused on optimizing individual strategies in hybrid parallelism, such as pipeline scheduling, device assignment, etc., which limits the overall training efficiency. This paper explores the intricate dependencies among four pivotal strategies-model scaling, model splitting, pipeline scheduling, and device assignment-and proposes Quartet, a holistic hybrid parallel framework for joint optimization. The novelty lies upon the formulation of parameterized pipeline scheduling and device assignment, alongside a pioneering analysis of model scaling's impact on the throughput. These provide the basis for orchestrating four strategies within a unified framework to maximize the overall training throughput efficiently. Evaluation results show that: for representative LLMs, Quartet improves the training throughput by up to 2.16x over the state-of-the-art synchronous hybrid parallel approaches.

引用

页码：424 / 438

页数：15

共 50 条

[1] Parallel ContextWindows for Large Language Models
Ratner, Nir
Levine, Yoav
Belinkov, Yonatan
Ram, Ori
Magar, Inbal
Abend, Omri
Karpas, Ehud
Shashua, Amnon
Leyton-Brown, Kevin
Shoham, Yoav
[J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6383 - 6402
[2] Towards the holistic design of alloys with large language models
Pei, Zongrui
Yin, Junqi
Neugebauer, Joerg
Jain, Anubhav
[J]. NATURE REVIEWS MATERIALS, 2024,
[3] Fast Parallel Training of Neural Language Models
Xiao, Tong
Zhu, Jingbo
Liu, Tongran
Zhang, Chunliang
[J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4193 - 4199
[4] Augmenting interpretable models with large language models during training
Chandan Singh
Armin Askari
Rich Caruana
Jianfeng Gao
[J]. Nature Communications, 14
[5] Augmenting interpretable models with large language models during training
Singh, Chandan
Askari, Armin
Caruana, Rich
Gao, Jianfeng
[J]. NATURE COMMUNICATIONS, 2023, 14 (01)
[6] Holistic Evaluation of Language Models
Bommasani, Rishi
Liang, Percy
Lee, Tony
[J]. ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, 2023, 1525 (01) : 140 - 146
[7] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism
Yuan, Tailing
Liu, Yuliang
Ye, Xucheng
Zhang, Shenglong
Tan, Jianchao
Chen, Bin
Song, Chengru
Zhang, Di
[J]. PROCEEDINGS OF THE 2024 USENIX ANNUAL TECHNICAL CONFERENCE, ATC 2024, 2024, : 545 - 561
[8] GalaxyGPT: A Hybrid Framework for Large Language Model Safety
Zhou, Hange
Zheng, Jiabin
Zhang, Longtu
[J]. IEEE ACCESS, 2024, 12 : 94436 - 94451
[9] Training Hybrid Language Models by Marginalizing over Segmentations
Grave, Edouard
Sukhbaatar, Sainbayar
Bojanowski, Piotr
Joulin, Armand
[J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1477 - 1482
[10] EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Zhou, Weikang
Wang, Xiao
Xiong, Limao
Xia, Han
Gu, Yingshuang
Chai, Mingxu
Zhu, Fukang
Huang, Caishuang
Dou, Shihan
Xi, Zhiheng
Zheng, Rui
Gao, Songyang
Zou, Yicheng
Yan, Hang
Le, Yifan
Wang, Ruohui
Li, Lijun
Shao, Jing
Gui, Tao
Zhang, Qi
Huang, Xuanjing
[J]. arXiv,

← 1 2 3 4 5 →