Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

被引:0
|
作者
Zhang, Weigang [1 ,2 ]
Zhou, Biyu [1 ,2 ]
Wu, Xing [1 ,2 ]
Gao, Chaochen [1 ,2 ]
Liu, Zhibing [1 ,2 ]
Tang, Xuehai [1 ,2 ]
Li, Ruixuan [1 ,2 ]
Han, Jizhong [1 ,2 ]
Hu, Songlin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
关键词
Hybrid Parallelism; Large Language Models; Distributed Training;
D O I
10.1007/978-3-031-69766-1_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hybrid parallelism is popular in training large language models (LLMs). However, existing efforts have focused on optimizing individual strategies in hybrid parallelism, such as pipeline scheduling, device assignment, etc., which limits the overall training efficiency. This paper explores the intricate dependencies among four pivotal strategies-model scaling, model splitting, pipeline scheduling, and device assignment-and proposes Quartet, a holistic hybrid parallel framework for joint optimization. The novelty lies upon the formulation of parameterized pipeline scheduling and device assignment, alongside a pioneering analysis of model scaling's impact on the throughput. These provide the basis for orchestrating four strategies within a unified framework to maximize the overall training throughput efficiently. Evaluation results show that: for representative LLMs, Quartet improves the training throughput by up to 2.16x over the state-of-the-art synchronous hybrid parallel approaches.
引用
收藏
页码:424 / 438
页数:15
相关论文
共 50 条
  • [11] Development of masters' emotional intelligence in the framework of holistic foreign language training at technical university
    Dmitrieva, E. N.
    Poskrebysheva, T. A.
    [J]. YAZYK I KULTURA-LANGUAGE AND CULTURE, 2021, (54): : 131 - 151
  • [12] Extracting Training Data from Large Language Models
    Carlini, Nicholas
    Tramer, Florian
    Wallace, Eric
    Jagielski, Matthew
    Herbert-Voss, Ariel
    Lee, Katherine
    Roberts, Adam
    Brown, Tom
    Song, Dawn
    Erlingsson, Ulfar
    Oprea, Alina
    Raffel, Colin
    [J]. PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, 2021, : 2633 - 2650
  • [13] DISTRIBUTED TRAINING OF LARGE SCALE EXPONENTIAL LANGUAGE MODELS
    Sethy, Abhinav
    Chen, Stanley F.
    Ramabhadran, Bhuvana
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5520 - 5523
  • [14] LARGE MARGIN TRAINING IMPROVES LANGUAGE MODELS FOR ASR
    Wang, Jilin
    Huang, Jiaji
    Church, Kenneth Ward
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7368 - 7372
  • [15] Emergent Structures and Training Dynamics in Large Language Models
    Teehan, Ryan
    Clinciu, Miruna
    Serikov, Oleg
    Szczechla, Eliza
    Seelam, Natasha
    Mirkin, Shachar
    Gokaslan, Aaron
    [J]. PROCEEDINGS OF WORKSHOP ON CHALLENGES & PERSPECTIVES IN CREATING LARGE LANGUAGE MODELS (BIGSCIENCE EPISODE #5), 2022, : 146 - 159
  • [16] Targeted training for numerical reasoning with large language models
    Li, Xiao
    Liu, Sichen
    Zhu, Yin
    Cheng, Gong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2024,
  • [17] Strategies for Training Large Vocabulary Neural Language Models
    Chen, Wenlin
    Grangier, David
    Auli, Michael
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1975 - 1985
  • [18] A Conceptual Framework for Subdomain Specific Pre-Training of Large Language Models for Green Claim Detection
    Moodaley, Wayne
    Telukdarie, Arnesh
    [J]. EUROPEAN JOURNAL OF SUSTAINABLE DEVELOPMENT, 2023, 12 (04): : 319 - 329
  • [19] Application of Holistic Artificial Intelligence and Large Language Models for Comprehensive Information Collection
    Han, Xu
    Sun, Yawei
    Zhao, Lu
    [J]. Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2024, 47 (04): : 11 - 19
  • [20] Construction and Application of Holistic Artificial Intelligence System for Medical Large Language Models
    Luo, Yan
    Liu, Yuyang
    Li, Xiaoying
    Liu, Hui
    [J]. Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2024, 47 (04): : 98 - 104