Grape: Practical and Efficient Graph-based Executions for Dynamic Deep Neural Networks on GPUs

被引：0

作者：

Zheng, Bojian ^{[1
]}

Yu, Cody Hao ^{[2
]}

Wang, Jie ^{[2
]}

Ding, Yaoyao ^{[1
]}

Liu, Yizhi ^{[2
]}

Wang, Yida ^{[2
]}

Pekhimenko, Gennady ^{[1
,3
]}

机构：

[1] Univ Toronto, CentML, Toronto, ON, Canada

[2] Amazon, Santa Clara, CA USA

[3] Vector Inst, Toronto, ON, Canada

来源：

56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023 | 2023年

基金：

加拿大自然科学与工程研究理事会; 加拿大创新基金会;

关键词：

machine learning compilers; CUDA graphs; dynamic neural networks;

D O I：

10.1145/3613424.3614248

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Achieving high performance in machine learning workloads is a crucial yet difficult task. To achieve high runtime performance on hardware platforms such as GPUs, graph-based executions such as CUDA graphs are often used to eliminate CPU runtime overheads by submitting jobs in the granularity of multiple kernels. However, many machine learning workloads, especially dynamic deep neural networks (DNNs) with varying-sized inputs or datadependent control flows, face challenges when directly using CUDA graphs to achieve optimal performance. We observe that the use of graph-based executions poses three key challenges in terms of efficiency and even practicability: (1) Extra data movements when copying input values to graphs' placeholders. (2) High GPU memory consumption due to the numerous CUDA graphs created to efficiently support dynamic-shape workloads. (3) Inability to handle data-dependent control flows. To address those challenges, we propose Grape, a new graph compiler that enables practical and efficient graph-based executions for dynamic DNNs on GPUs. Grape comprises three key components: (1) an alias predictor that automatically removes extra data movements by leveraging code positions at the Python frontend, (2) a metadata compressor that efficiently utilizes the data redundancy in CUDA graphs' memory regions by compressing them, and (3) a predication rewriter that safely replaces control flows with predication contexts while preserving programs' semantics. The three components improve the efficiency and broaden the optimization scope of graph-based executions while allowing machine learning practitioners to program dynamic DNNs at the Python level with minimal source code changes. We evaluate Grape on state-of-the-art text generation (GPT-2, GPT-J) and speech recognition (Wav2Vec2) workloads, which include both training and inference, using real systems with modern GPUs. Our evaluation shows that Grape achieves up to 36.43x less GPU memory consumption and up to 1.26x better performance than prior works on graph-based executions that directly use CUDA graphs. Furthermore, Grape can optimize workloads that are impractical for prior works due to the three key challenges, achieving 1.78x and 1.82x better performance on GPT-J and Wav2Vec2 respectively than the original implementations that do not use graph-based executions.

引用

下载

页码：1364 / 1380

页数：17

共 50 条

[1] Graph-Based Similarity of Deep Neural Networks
Xuan, Qi (xuanqi@zjut.edu.cn), 2025, 614
[2] A graph-based interpretability method for deep neural networks
Wang, Tao
Zheng, Xiangwei
Zhang, Lifeng
Cui, Zhen
Xu, Chunyan
NEUROCOMPUTING, 2023, 555
[3] A Graph-Based Interpretability Method for Deep Neural Networks
Wang, Tao
Zheng, Xiangwei
Zhang, Lifeng
Cui, Zhen
Xu, Chunyan
SSRN, 2022,
[4] Graph-based Recommendation using Graph Neural Networks
Dossena, Marco
Irwin, Christopher
Portinale, Luigi
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1769 - 1774
[5] Graph-based Dependency Parsing with Graph Neural Networks
Ji, Tao
Wu, Yuanbin
Lan, Man
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 2475 - 2485
[6] Dynamic Graph Segmentation for Deep Graph Neural Networks
Kang, Johan Kok Zhi
Yang, Suwei
Venkatesan, Suriya
Tan, Sien Yi
Cheng, Feng
He, Bingsheng
PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022, 2022, : 4601 - 4611
[7] Efficient Scaling of Dynamic Graph Neural Networks
Chakaravarthy, Venkatesan T.
Pandian, Shivmaran S.
Raje, Saurabh
Sabharwal, Yogish
Suzumura, Toyotaro
Ubaru, Shashanka
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[8] Graph-based deep learning for communication networks: A survey
Jiang, Weiwei
COMPUTER COMMUNICATIONS, 2022, 185 : 40 - 54
[9] Neural Networks Regularization With Graph-Based Local Resampling
Assis, Alex D.
Torres, Luiz C. B.
Araujo, Lourencro R. G.
Hanriot, Vitor M.
Braga, Antonio P.
IEEE ACCESS, 2021, 9 : 50727 - 50737
[10] An Efficient Data Structure for Dynamic Graph on GPUs
Zou, Lei
Zhang, Fan
Lin, Yinnian
Yu, Yanpeng
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11051 - 11066

← 1 2 3 4 5 →