mCAP: Memory-Centric Partitioning for Large-Scale Pipeline-Parallel DNN Training

被引:2
|
作者
Dreuning, Henk [1 ,2 ]
Bal, Henri E. [2 ]
van Nieuwpoort, Rob, V [1 ,3 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
[2] Vrije Univ Amsterdam, Amsterdam, Netherlands
[3] Netherlands eSci Ctr, Amsterdam, Netherlands
来源
基金
荷兰研究理事会;
关键词
Deep Learning; Pipeline Parallelism; HPC;
D O I
10.1007/978-3-031-12597-3_10
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Memory usage is becoming an increasingly pressing bottleneck in the training process of Deep Neural Networks (DNNs), especially when training on Graphics Processing Units (GPUs). Existing solutions for multi-GPU training setups partition the neural network over the GPUs in a way that favors training throughput over memory usage, and thus maximum trainable network size. We propose mCAP, a partitioning solution for pipeline-parallel DNN training that focuses specifically on memory usage. It evenly distributes Deep Learning models over the available resources with respect to per-device peak memory usage. Our partitioning approach uses a novel incremental profiling strategy to extract per-layer memory usage statistics. A model-based predictor uses the profiling data to recommend a partitioning that balances peak memory usage. Our approach is DL-framework agnostic and orthogonal to existing memory optimizations found in large-scale DNN training systems. Our results show that our approach enables training of neural networks that are 1.55 times larger than existing partitioning solutions in terms of the number of parameters.
引用
收藏
页码:155 / 170
页数:16
相关论文
共 50 条
  • [1] Memory-Efficient Pipeline-Parallel DNN Training
    Narayanan, Deepak
    Phanishayee, Amar
    Shi, Kaiyu
    Chen, Xie
    Zaharia, Matei
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [2] CAPTURE: Memory-Centric Partitioning for Distributed DNN Training with Hybrid Parallelism
    Dreuning, Henk
    Verstoep, Kees
    Bal, Henri E.
    van Nieuwpoort, Rob V.
    [J]. 2023 IEEE 30TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC 2023, 2023, : 76 - 86
  • [3] CAPSlog: Scalable Memory-Centric Partitioning for Pipeline Parallelism
    Dreuning, Henk
    Liokouras, Anna Badia
    Ouyang, Xiaowei
    Bal, Henri E.
    van Nieuwpoort, Rob V.
    [J]. 2024 32ND EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, PDP 2024, 2024, : 17 - 25
  • [4] Visual Diagnostics of Parallel Performance in Training Large-Scale DNN Models
    Wei, Yating
    Wang, Zhiyong
    Wang, Zhongwei
    Dai, Yong
    Ou, Gongchang
    Gao, Han
    Yang, Haitao
    Wang, Yue
    Cao, Caleb Chen
    Weng, Luoxuan
    Lu, Jiaying
    Zhu, Rongchen
    Chen, Wei
    [J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (07) : 3915 - 3929
  • [5] Muulti-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture
    Hong, Byungchul
    Ro, Yeonju
    Kim, John
    [J]. 2018 51ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2018, : 682 - 695
  • [6] Swift : Expedited Failure Recovery for Large-Scale DNN Training
    Zhong, Yuchen
    Sheng, Guangming
    Liu, Juncheng
    Yuan, Jinhui
    Wu, Chuan
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (09) : 1644 - 1656
  • [7] A PARALLEL PARTITIONING METHOD FOR LARGE-SCALE CIRCUIT SIMULATION
    ZHANG, XD
    [J]. UNIVERSITY PROGRAMS IN COMPUTER-AIDED ENGINEERING, DESIGN, AND MANUFACTURING, 1989, : 134 - 141
  • [8] DistSim: A performance model of large-scale hybrid distributed DNN training
    Lu, Guandong
    Chen, Runzhe
    Wang, Yakai
    Zhou, Yangjie
    Zhang, Rui
    Hu, Zheng
    Miao, Yanming
    Cai, Zhifang
    Li, Li
    Leng, Jingwen
    Guo, Minyi
    [J]. PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2023, CF 2023, 2023, : 112 - 122
  • [9] GradientFlow: Optimizing Network Performance for Large-Scale Distributed DNN Training
    Sun, Peng
    Wen, Yonggang
    Han, Ruobing
    Feng, Wansen
    Yan, Shengen
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2022, 8 (02) : 495 - 507
  • [10] Graph-Centric Performance Analysis for Large-Scale Parallel Applications
    Jin, Yuyang
    Wang, Haojie
    Zhong, Runxin
    Zhang, Chen
    Liao, Xia
    Zhang, Feng
    Zhai, Jidong
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (07) : 1221 - 1238