Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

被引:0
|
作者
Zhang, Weigang [1 ,2 ]
Zhou, Biyu [1 ,2 ]
Wu, Xing [1 ,2 ]
Gao, Chaochen [1 ,2 ]
Liu, Zhibing [1 ,2 ]
Tang, Xuehai [1 ,2 ]
Li, Ruixuan [1 ,2 ]
Han, Jizhong [1 ,2 ]
Hu, Songlin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
关键词
Hybrid Parallelism; Large Language Models; Distributed Training;
D O I
10.1007/978-3-031-69766-1_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hybrid parallelism is popular in training large language models (LLMs). However, existing efforts have focused on optimizing individual strategies in hybrid parallelism, such as pipeline scheduling, device assignment, etc., which limits the overall training efficiency. This paper explores the intricate dependencies among four pivotal strategies-model scaling, model splitting, pipeline scheduling, and device assignment-and proposes Quartet, a holistic hybrid parallel framework for joint optimization. The novelty lies upon the formulation of parameterized pipeline scheduling and device assignment, alongside a pioneering analysis of model scaling's impact on the throughput. These provide the basis for orchestrating four strategies within a unified framework to maximize the overall training throughput efficiently. Evaluation results show that: for representative LLMs, Quartet improves the training throughput by up to 2.16x over the state-of-the-art synchronous hybrid parallel approaches.
引用
收藏
页码:424 / 438
页数:15
相关论文
共 50 条
  • [21] A Holistic Development Framework for Hybrid Bonding
    Sitarama, Srikrishna
    Jiang, Liu
    Dag, Sefa
    Masoomi, Mohammad
    Wang, Ying
    Lianto, Prayudi
    An, Jinho
    Wang, Ruiping
    See, Gilbert
    Sundarrajan, Arvind
    Bazizi, El Mehdi
    Ayyagari-Sangamalli, Buvna
    [J]. IEEE 72ND ELECTRONIC COMPONENTS AND TECHNOLOGY CONFERENCE (ECTC 2022), 2022, : 691 - 700
  • [22] PS-Hybrid: Hybrid communication framework for large recommendation model training
    Miao X.
    Zhang M.
    Shao Y.
    Cui B.
    [J]. Qinghua Daxue Xuebao/Journal of Tsinghua University, 2022, 62 (09): : 1417 - 1425
  • [23] Natural Language Dataset Generation Framework for Visualizations Powered by Large Language Models
    Ko, Hyung-Kwon
    Jeon, Hyeon
    Park, Gwanmo
    Kim, Dae Hyun
    Kim, Nam Wook
    Kim, Juho
    Seo, Jinwook
    [J]. PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS (CHI 2024), 2024,
  • [24] Framework for evaluating code generation ability of large language models
    Yeo, Sangyeop
    Ma, Yu-Seung
    Kim, Sang Cheol
    Jun, Hyungkook
    Kim, Taeho
    [J]. ETRI JOURNAL, 2024, 46 (01) : 106 - 117
  • [25] IterClean: An Iterative Data Cleaning Framework with Large Language Models
    Ni, Wei
    Zhang, Kaihang
    Miao, Xiaoye
    Zhao, Xiangyu
    Wu, Yangyang
    Yin, Jianwei
    [J]. PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 100 - 105
  • [26] MassiveClicks: A Massively-Parallel Framework for Efficient Click Models Training
    Thijssen, Skip
    Khandel, Pooya
    Yates, Andrew
    Varbanescu, Ana-Lucia
    [J]. EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT I, EURO-PAR 2023, 2024, 14351 : 232 - 245
  • [27] A Survey of Metrics to Enhance Training Dependability in Large Language Models
    Fang, Wenyi
    Zhang, Hao
    Gong, Ziyu
    Zeng, Longbin
    Lu, Xuhui
    Liu, Biao
    Wu, Xiaoyu
    Zheng, Yang
    Hu, Zheng
    Zhang, Xun
    [J]. 2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING WORKSHOPS, ISSREW, 2023, : 180 - 185
  • [28] Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators
    Emani, Murali
    Foreman, Sam
    Sastry, Varuni
    Xie, Zhen
    Raskar, Siddhisanket
    Arnold, William
    Thakur, Rajeev
    Vishwanath, Venkatram
    Papka, Michael E.
    Shanmugavelu, Sanjif
    Gandhi, Darshan
    Zhao, Hengyu
    Ma, Dun
    Ranganath, Kiran
    Weisner, Rick
    Chen, Jiunn-yeu
    Yang, Yuting
    Vassilieva, Natalia
    Zhang, Bin C.
    Howland, Sylvia
    Tsyplikhin, Alexander
    [J]. 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 48 - 57
  • [29] A Memory-Efficient Hybrid Parallel Framework for Deep Neural Network Training
    Li, Dongsheng
    Li, Shengwei
    Lai, Zhiquan
    Fu, Yongquan
    Ye, Xiangyu
    Cai, Lei
    Qiao, Linbo
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (04) : 577 - 591
  • [30] A Parallel Training Algorithm for Hierarchical Pitman-Yor Process Language Models
    Huang, Songfang
    Renals, Steve
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2663 - 2666