Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

被引：0

作者：

Zhang, Weigang ^{[1
,2
]}

Zhou, Biyu ^{[1
,2
]}

Wu, Xing ^{[1
,2
]}

Gao, Chaochen ^{[1
,2
]}

Liu, Zhibing ^{[1
,2
]}

Tang, Xuehai ^{[1
,2
]}

Li, Ruixuan ^{[1
,2
]}

Han, Jizhong ^{[1
,2
]}

Hu, Songlin ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

来源：

EURO-PAR 2024: PARALLEL PROCESSING, PART II, EURO-PAR 2024 | 2024年 / 14802卷

关键词：

Hybrid Parallelism; Large Language Models; Distributed Training;

D O I：

10.1007/978-3-031-69766-1_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Hybrid parallelism is popular in training large language models (LLMs). However, existing efforts have focused on optimizing individual strategies in hybrid parallelism, such as pipeline scheduling, device assignment, etc., which limits the overall training efficiency. This paper explores the intricate dependencies among four pivotal strategies-model scaling, model splitting, pipeline scheduling, and device assignment-and proposes Quartet, a holistic hybrid parallel framework for joint optimization. The novelty lies upon the formulation of parameterized pipeline scheduling and device assignment, alongside a pioneering analysis of model scaling's impact on the throughput. These provide the basis for orchestrating four strategies within a unified framework to maximize the overall training throughput efficiently. Evaluation results show that: for representative LLMs, Quartet improves the training throughput by up to 2.16x over the state-of-the-art synchronous hybrid parallel approaches.

引用

页码：424 / 438

页数：15

共 50 条

[11] Development of masters' emotional intelligence in the framework of holistic foreign language training at technical university
Dmitrieva, E. N.
Poskrebysheva, T. A.
[J]. YAZYK I KULTURA-LANGUAGE AND CULTURE, 2021, (54): : 131 - 151
[12] Extracting Training Data from Large Language Models
Carlini, Nicholas
Tramer, Florian
Wallace, Eric
Jagielski, Matthew
Herbert-Voss, Ariel
Lee, Katherine
Roberts, Adam
Brown, Tom
Song, Dawn
Erlingsson, Ulfar
Oprea, Alina
Raffel, Colin
[J]. PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2633 - 2650
[13] DISTRIBUTED TRAINING OF LARGE SCALE EXPONENTIAL LANGUAGE MODELS
Sethy, Abhinav
Chen, Stanley F.
Ramabhadran, Bhuvana
[J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5520 - 5523
[14] LARGE MARGIN TRAINING IMPROVES LANGUAGE MODELS FOR ASR
Wang, Jilin
Huang, Jiaji
Church, Kenneth Ward
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7368 - 7372
[15] Emergent Structures and Training Dynamics in Large Language Models
Teehan, Ryan
Clinciu, Miruna
Serikov, Oleg
Szczechla, Eliza
Seelam, Natasha
Mirkin, Shachar
Gokaslan, Aaron
[J]. PROCEEDINGS OF WORKSHOP ON CHALLENGES & PERSPECTIVES IN CREATING LARGE LANGUAGE MODELS (BIGSCIENCE EPISODE #5), 2022, : 146 - 159
[16] Targeted training for numerical reasoning with large language models
Li, Xiao
Liu, Sichen
Zhu, Yin
Cheng, Gong
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024,
[17] Strategies for Training Large Vocabulary Neural Language Models
Chen, Wenlin
Grangier, David
Auli, Michael
[J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1975 - 1985
[18] A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection
Moodaley, Wayne
Telukdarie, Arnesh
[J]. EUROPEAN JOURNAL OF SUSTAINABLE DEVELOPMENT, 2023, 12 (04): : 319 - 329
[19] Application of Holistic Artificial Intelligence and Large Language Models for Comprehensive Information Collection
Han, Xu
Sun, Yawei
Zhao, Lu
[J]. Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2024, 47 (04): : 11 - 19
[20] Construction and Application of Holistic Artificial Intelligence System for Medical Large Language Models
Luo, Yan
Liu, Yuyang
Li, Xiaoying
Liu, Hui
[J]. Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2024, 47 (04): : 98 - 104

← 1 2 3 4 5 →