Quartet: A Holistic Hybrid Parallel Framework for Training Large Language Models

被引:0
|
作者
Zhang, Weigang [1 ,2 ]
Zhou, Biyu [1 ,2 ]
Wu, Xing [1 ,2 ]
Gao, Chaochen [1 ,2 ]
Liu, Zhibing [1 ,2 ]
Tang, Xuehai [1 ,2 ]
Li, Ruixuan [1 ,2 ]
Han, Jizhong [1 ,2 ]
Hu, Songlin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
关键词
Hybrid Parallelism; Large Language Models; Distributed Training;
D O I
10.1007/978-3-031-69766-1_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hybrid parallelism is popular in training large language models (LLMs). However, existing efforts have focused on optimizing individual strategies in hybrid parallelism, such as pipeline scheduling, device assignment, etc., which limits the overall training efficiency. This paper explores the intricate dependencies among four pivotal strategies-model scaling, model splitting, pipeline scheduling, and device assignment-and proposes Quartet, a holistic hybrid parallel framework for joint optimization. The novelty lies upon the formulation of parameterized pipeline scheduling and device assignment, alongside a pioneering analysis of model scaling's impact on the throughput. These provide the basis for orchestrating four strategies within a unified framework to maximize the overall training throughput efficiently. Evaluation results show that: for representative LLMs, Quartet improves the training throughput by up to 2.16x over the state-of-the-art synchronous hybrid parallel approaches.
引用
收藏
页码:424 / 438
页数:15
相关论文
共 50 条
  • [1] Parallel ContextWindows for Large Language Models
    Ratner, Nir
    Levine, Yoav
    Belinkov, Yonatan
    Ram, Ori
    Magar, Inbal
    Abend, Omri
    Karpas, Ehud
    Shashua, Amnon
    Leyton-Brown, Kevin
    Shoham, Yoav
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6383 - 6402
  • [2] Towards the holistic design of alloys with large language models
    Pei, Zongrui
    Yin, Junqi
    Neugebauer, Joerg
    Jain, Anubhav
    [J]. NATURE REVIEWS MATERIALS, 2024,
  • [3] Fast Parallel Training of Neural Language Models
    Xiao, Tong
    Zhu, Jingbo
    Liu, Tongran
    Zhang, Chunliang
    [J]. PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4193 - 4199
  • [4] Augmenting interpretable models with large language models during training
    Chandan Singh
    Armin Askari
    Rich Caruana
    Jianfeng Gao
    [J]. Nature Communications, 14
  • [5] Augmenting interpretable models with large language models during training
    Singh, Chandan
    Askari, Armin
    Caruana, Rich
    Gao, Jianfeng
    [J]. NATURE COMMUNICATIONS, 2023, 14 (01)
  • [6] Holistic Evaluation of Language Models
    Bommasani, Rishi
    Liang, Percy
    Lee, Tony
    [J]. ANNALS OF THE NEW YORK ACADEMY OF SCIENCES, 2023, 1525 (01) : 140 - 146
  • [7] Accelerating the Training of Large Language Models using Efficient Activation Rematerialization and Optimal Hybrid Parallelism
    Yuan, Tailing
    Liu, Yuliang
    Ye, Xucheng
    Zhang, Shenglong
    Tan, Jianchao
    Chen, Bin
    Song, Chengru
    Zhang, Di
    [J]. PROCEEDINGS OF THE 2024 USENIX ANNUAL TECHNICAL CONFERENCE, ATC 2024, 2024, : 545 - 561
  • [8] GalaxyGPT: A Hybrid Framework for Large Language Model Safety
    Zhou, Hange
    Zheng, Jiabin
    Zhang, Longtu
    [J]. IEEE ACCESS, 2024, 12 : 94436 - 94451
  • [9] Training Hybrid Language Models by Marginalizing over Segmentations
    Grave, Edouard
    Sukhbaatar, Sainbayar
    Bojanowski, Piotr
    Joulin, Armand
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1477 - 1482
  • [10] EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
    Zhou, Weikang
    Wang, Xiao
    Xiong, Limao
    Xia, Han
    Gu, Yingshuang
    Chai, Mingxu
    Zhu, Fukang
    Huang, Caishuang
    Dou, Shihan
    Xi, Zhiheng
    Zheng, Rui
    Gao, Songyang
    Zou, Yicheng
    Yan, Hang
    Le, Yifan
    Wang, Ruohui
    Li, Lijun
    Shao, Jing
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    [J]. arXiv,