Optimizing GPU Deep Learning Operators with Polyhedral Scheduling Constraint Injection

被引:4
|
作者
Bastoul, Cedric [1 ]
Zhang, Zhen [1 ]
Razanajato, Harenome [1 ]
Lossing, Nelson [1 ]
Susungi, Adilla [1 ]
de Juan, Javier [1 ]
Filhol, Etienne [1 ]
Jarry, Baptiste [1 ]
Consolaro, Gianpietro [1 ]
Zhang, Renwei [2 ]
机构
[1] Huawei Technol France, Paris, France
[2] Huawei Technol Co Ltd, Beijing, Peoples R China
关键词
Polyhedral model; scheduling; vectorization;
D O I
10.1109/CGO53902.2022.9741260
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic parallel code generation from high-level abstractions such as those manipulated by artificial intelligence and deep learning (AI/DL) frameworks heavily rely on compiler techniques for automatic parallelization and optimization. Many recent advances rely on the polyhedral framework for this task because of its ability to model and to apply a wide range of loop transformations. However, modeling the complexity of the target architecture and of efficient cost models to decide about the best transformation is in general out of reach for a framework based on linear/affine constraints. In this work, we propose to decouple the polyhedral framework into linear and non-linear components. We introduce the constraint tree abstraction which may be generated by a non-linear optimizer and injected to the polyhedral optimization process to build better solutions. We present how to benefit from such a mechanism to generate efficient codes for GPU in the context of AI/DL operators. Our constraint injection allows to drive the polyhedral scheduler towards efficient solutions for load/store vectorization relying both on memory coalescing and vector types. We implemented our scheduler supporting constraint injection and our constraint construction system within a production AI/DL framework. Experiments on well known neural networks show the efficiency of this approach with respect to state-of-the-art polyhedral scheduling for GPU.
引用
收藏
页码:313 / 324
页数:12
相关论文
共 50 条
  • [1] Optimizing Deep Learning Workloads on ARM GPU with TVM
    Zheng, Lianmin
    Chen, Tianqi
    1ST ACM REQUEST WORKSHOP/TOURNAMENT ON REPRODUCIBLE SOFTWARE/HARDWARE CO-DESIGN OF PARETO-EFFICIENT DEEP LEARNING, 2018,
  • [2] Deep Learning Workload Scheduling in GPU Datacenters: A Survey
    Ye, Zhisheng
    Gao, Wei
    Hu, Qinghao
    Sun, Peng
    Wang, Xiaolin
    Luo, Yingwei
    Zhang, Tianwei
    Wen, Yonggang
    ACM COMPUTING SURVEYS, 2024, 56 (06)
  • [3] Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
    Kwon, Woosuk
    Yu, Gyeong-In
    Jeong, Eunji
    Chun, Byung-Gon
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] A Deep Q-Learning Approach for GPU Task Scheduling
    Luley, Ryan S.
    Qiu, Qinru
    2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [5] Scheduling CPU for GPU-based Deep Learning Jobs
    Xiao, Wencong
    Han, Zhenhua
    Zhao, Hanyu
    Peng, Xuan
    Zhang, Quanlu
    Yang, Fan
    Zhou, Lidong
    PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18), 2018, : 503 - 503
  • [6] Poster Abstract: Deep Learning Workloads Scheduling with Reinforcement Learning on GPU Clusters
    Chen, Zhaoyun
    Luo, Lei
    Quan, Wei
    Wen, Mei
    Zhang, Chunyuan
    IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM 2019 WKSHPS), 2019, : 1023 - 1024
  • [7] ACETest: Automated Constraint Extraction for Testing Deep Learning Operators
    Shi, Jingyi
    Xiao, Yang
    Li, Yuekang
    Li, Yeting
    Yu, Dongsong
    Yu, Chendong
    Su, Hui
    Chen, Yufeng
    Huo, Wei
    PROCEEDINGS OF THE 32ND ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2023, 2023, : 690 - 702
  • [8] Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
    Pal, Saptadeep
    Ebrahimi, Eiman
    Zulfiqar, Arslan
    Fu, Yaosheng
    Zhang, Victor
    Migacz, Szymon
    Nellans, David
    Gupta, Puneet
    IEEE MICRO, 2019, 39 (05) : 91 - 101
  • [9] Crux: GPU-Efficient Communication Scheduling for Deep Learning Training
    Cao, Jiamin
    Guan, Yu
    Qian, Kun
    Gao, Jiaqi
    Xiao, Wencong
    Dong, Jianbo
    Fu, Binzhang
    Cai, Dennis
    Zhai, Ennan
    PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024, 2024, : 1 - 15
  • [10] Voda: A GPU Scheduling Platform for Elastic Deep Learning in Kubernetes Clusters
    Hsieh, Tsung-Tso
    Lee, Che-Rung
    2023 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E, 2023, : 131 - 140