Optimizing GPU Deep Learning Operators with Polyhedral Scheduling Constraint Injection

被引：4

作者：

Bastoul, Cedric ^{[1
]}

Zhang, Zhen ^{[1
]}

Razanajato, Harenome ^{[1
]}

Lossing, Nelson ^{[1
]}

Susungi, Adilla ^{[1
]}

de Juan, Javier ^{[1
]}

Filhol, Etienne ^{[1
]}

Jarry, Baptiste ^{[1
]}

Consolaro, Gianpietro ^{[1
]}

Zhang, Renwei ^{[2
]}

机构：

[1] Huawei Technol France, Paris, France

[2] Huawei Technol Co Ltd, Beijing, Peoples R China

来源：

CGO '22: PROCEEDINGS OF THE 2022 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO) | 2022年

关键词：

Polyhedral model; scheduling; vectorization;

D O I：

10.1109/CGO53902.2022.9741260

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automatic parallel code generation from high-level abstractions such as those manipulated by artificial intelligence and deep learning (AI/DL) frameworks heavily rely on compiler techniques for automatic parallelization and optimization. Many recent advances rely on the polyhedral framework for this task because of its ability to model and to apply a wide range of loop transformations. However, modeling the complexity of the target architecture and of efficient cost models to decide about the best transformation is in general out of reach for a framework based on linear/affine constraints. In this work, we propose to decouple the polyhedral framework into linear and non-linear components. We introduce the constraint tree abstraction which may be generated by a non-linear optimizer and injected to the polyhedral optimization process to build better solutions. We present how to benefit from such a mechanism to generate efficient codes for GPU in the context of AI/DL operators. Our constraint injection allows to drive the polyhedral scheduler towards efficient solutions for load/store vectorization relying both on memory coalescing and vector types. We implemented our scheduler supporting constraint injection and our constraint construction system within a production AI/DL framework. Experiments on well known neural networks show the efficiency of this approach with respect to state-of-the-art polyhedral scheduling for GPU.

引用

页码：313 / 324

页数：12

共 50 条

[21] BatOpt: Optimizing GPU-Based Deep Learning Inference Using Dynamic Batch Processing
Zhang, Deyu
Luo, Yunzhen
Wang, Yaobo
Kui, Xiaoyan
Ren, Ju
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2024, 12 (01) : 174 - 185
[22] Optimizing execution for pipelined-based distributed deep learning in a heterogeneously networked GPU cluster
Zhang, Jinghui
Zhan, Jun
Li, Jiange
Jin, Jiahui
Qian, Lei
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (23):
[23] Scheduling Deep Learning Training in GPU Cluster Using the Model-Similarity-Based Policy
Thanapol, Panissara
Lavangnananda, Kittichai
Leprevost, Franck
Schleich, Julien
Bouvry, Pascal
INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 363 - 374
[24] DASH: Scheduling Deep Learning Workloads on Multi-Generational GPU-Accelerated Clusters
Li, Baolin
Patel, Tirthak
Gadepally, Vijay
Gettings, Karen
Samsi, Siddharth
Tiwari, Devesh
2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC), 2022,
[25] Efficient NPU–GPU scheduling for real-time deep learning inference on mobile devices
Chengwu Yu
Meng Wang
Shan Chen
Wanqi Wang
Weiwei Fang
Yanming Chen
Neal N.Xiong
Journal of Real-Time Image Processing, 2025, 22 (2)
[26] Optimizing quay crane scheduling using deep reinforcement learning with hybrid metaheuristic algorithm
Long, Le Ngoc Bao
You, Sam-Sang
Cuong, Truong Ngoc
Kim, Hwan-Seong
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
[27] DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling
Xiao, Qinfeng
Wang, Jing
APPLIED SCIENCES-BASEL, 2024, 14 (13):
[28] Sparse GPU Kernels for Deep Learning
Gale, Trevor
Zaharia, Matei
Young, Cliff
Elsen, Erich
PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
[29] Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
Luo, Yizhou
Wang, Qiang
Shi, Shaohuai
Lai, Jiaxin
Qi, Shuhan
Zhang, Jiajia
Wang, Xuan
2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
[30] LEARNING TO IMPROVE CONSTRAINT-BASED SCHEDULING
ZWEBEN, M
DAVIS, E
DAUN, B
DRASCHER, E
DEALE, M
ESKEY, M
ARTIFICIAL INTELLIGENCE, 1992, 58 (1-3) : 271 - 296

← 1 2 3 4 5 →