Optimizing GPU Deep Learning Operators with Polyhedral Scheduling Constraint Injection

被引：4

作者：

Bastoul, Cedric ^{[1
]}

Zhang, Zhen ^{[1
]}

Razanajato, Harenome ^{[1
]}

Lossing, Nelson ^{[1
]}

Susungi, Adilla ^{[1
]}

de Juan, Javier ^{[1
]}

Filhol, Etienne ^{[1
]}

Jarry, Baptiste ^{[1
]}

Consolaro, Gianpietro ^{[1
]}

Zhang, Renwei ^{[2
]}

机构：

[1] Huawei Technol France, Paris, France

[2] Huawei Technol Co Ltd, Beijing, Peoples R China

来源：

CGO '22: PROCEEDINGS OF THE 2022 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO) | 2022年

关键词：

Polyhedral model; scheduling; vectorization;

D O I：

10.1109/CGO53902.2022.9741260

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic parallel code generation from high-level abstractions such as those manipulated by artificial intelligence and deep learning (AI/DL) frameworks heavily rely on compiler techniques for automatic parallelization and optimization. Many recent advances rely on the polyhedral framework for this task because of its ability to model and to apply a wide range of loop transformations. However, modeling the complexity of the target architecture and of efficient cost models to decide about the best transformation is in general out of reach for a framework based on linear/affine constraints. In this work, we propose to decouple the polyhedral framework into linear and non-linear components. We introduce the constraint tree abstraction which may be generated by a non-linear optimizer and injected to the polyhedral optimization process to build better solutions. We present how to benefit from such a mechanism to generate efficient codes for GPU in the context of AI/DL operators. Our constraint injection allows to drive the polyhedral scheduler towards efficient solutions for load/store vectorization relying both on memory coalescing and vector types. We implemented our scheduler supporting constraint injection and our constraint construction system within a production AI/DL framework. Experiments on well known neural networks show the efficiency of this approach with respect to state-of-the-art polyhedral scheduling for GPU.

引用

页码：313 / 324

页数：12

共 50 条

[1] Optimizing Deep Learning Workloads on ARM GPU with TVM
Zheng, Lianmin
Chen, Tianqi
1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING, 2018,
[2] Deep Learning Workload Scheduling in GPU Datacenters: A Survey
Ye, Zhisheng
Gao, Wei
Hu, Qinghao
Sun, Peng
Wang, Xiaolin
Luo, Yingwei
Zhang, Tianwei
Wen, Yonggang
ACM COMPUTING SURVEYS, 2024, 56 (06)
[3] Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Kwon, Woosuk
Yu, Gyeong-In
Jeong, Eunji
Chun, Byung-Gon
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[4] A Deep Q-Learning Approach for GPU Task Scheduling
Luley, Ryan S.
Qiu, Qinru
2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[5] Scheduling CPU for GPU-based Deep Learning Jobs
Xiao, Wencong
Han, Zhenhua
Zhao, Hanyu
Peng, Xuan
Zhang, Quanlu
Yang, Fan
Zhou, Lidong
PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18), 2018, : 503 - 503
[6] Poster Abstract: Deep Learning Workloads Scheduling with Reinforcement Learning on GPU Clusters
Chen, Zhaoyun
Luo, Lei
Quan, Wei
Wen, Mei
Zhang, Chunyuan
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM 2019 WKSHPS), 2019, : 1023 - 1024
[7] ACETest: Automated Constraint Extraction for Testing Deep Learning Operators
Shi, Jingyi
Xiao, Yang
Li, Yuekang
Li, Yeting
Yu, Dongsong
Yu, Chendong
Su, Hui
Chen, Yufeng
Huo, Wei
PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 690 - 702
[8] Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Pal, Saptadeep
Ebrahimi, Eiman
Zulfiqar, Arslan
Fu, Yaosheng
Zhang, Victor
Migacz, Szymon
Nellans, David
Gupta, Puneet
IEEE MICRO, 2019, 39 (05) : 91 - 101
[9] Crux: GPU-Efficient Communication Scheduling for Deep Learning Training
Cao, Jiamin
Guan, Yu
Qian, Kun
Gao, Jiaqi
Xiao, Wencong
Dong, Jianbo
Fu, Binzhang
Cai, Dennis
Zhai, Ennan
PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 1 - 15
[10] Voda: A GPU Scheduling Platform for Elastic Deep Learning in Kubernetes Clusters
Hsieh, Tsung-Tso
Lee, Che-Rung
2023 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E, 2023, : 131 - 140

← 1 2 3 4 5 →