Optimizing GPU Deep Learning Operators with Polyhedral Scheduling Constraint Injection

被引:4
|
作者
Bastoul, Cedric [1 ]
Zhang, Zhen [1 ]
Razanajato, Harenome [1 ]
Lossing, Nelson [1 ]
Susungi, Adilla [1 ]
de Juan, Javier [1 ]
Filhol, Etienne [1 ]
Jarry, Baptiste [1 ]
Consolaro, Gianpietro [1 ]
Zhang, Renwei [2 ]
机构
[1] Huawei Technol France, Paris, France
[2] Huawei Technol Co Ltd, Beijing, Peoples R China
关键词
Polyhedral model; scheduling; vectorization;
D O I
10.1109/CGO53902.2022.9741260
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic parallel code generation from high-level abstractions such as those manipulated by artificial intelligence and deep learning (AI/DL) frameworks heavily rely on compiler techniques for automatic parallelization and optimization. Many recent advances rely on the polyhedral framework for this task because of its ability to model and to apply a wide range of loop transformations. However, modeling the complexity of the target architecture and of efficient cost models to decide about the best transformation is in general out of reach for a framework based on linear/affine constraints. In this work, we propose to decouple the polyhedral framework into linear and non-linear components. We introduce the constraint tree abstraction which may be generated by a non-linear optimizer and injected to the polyhedral optimization process to build better solutions. We present how to benefit from such a mechanism to generate efficient codes for GPU in the context of AI/DL operators. Our constraint injection allows to drive the polyhedral scheduler towards efficient solutions for load/store vectorization relying both on memory coalescing and vector types. We implemented our scheduler supporting constraint injection and our constraint construction system within a production AI/DL framework. Experiments on well known neural networks show the efficiency of this approach with respect to state-of-the-art polyhedral scheduling for GPU.
引用
收藏
页码:313 / 324
页数:12
相关论文
共 50 条
  • [21] BatOpt: Optimizing GPU-Based Deep Learning Inference Using Dynamic Batch Processing
    Zhang, Deyu
    Luo, Yunzhen
    Wang, Yaobo
    Kui, Xiaoyan
    Ren, Ju
    IEEE TRANSACTIONS ON CLOUD COMPUTING, 2024, 12 (01) : 174 - 185
  • [22] Optimizing execution for pipelined-based distributed deep learning in a heterogeneously networked GPU cluster
    Zhang, Jinghui
    Zhan, Jun
    Li, Jiange
    Jin, Jiahui
    Qian, Lei
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (23):
  • [23] Scheduling Deep Learning Training in GPU Cluster Using the Model-Similarity-Based Policy
    Thanapol, Panissara
    Lavangnananda, Kittichai
    Leprevost, Franck
    Schleich, Julien
    Bouvry, Pascal
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 363 - 374
  • [24] DASH: Scheduling Deep Learning Workloads on Multi-Generational GPU-Accelerated Clusters
    Li, Baolin
    Patel, Tirthak
    Gadepally, Vijay
    Gettings, Karen
    Samsi, Siddharth
    Tiwari, Devesh
    2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC), 2022,
  • [25] Efficient NPU–GPU scheduling for real-time deep learning inference on mobile devices
    Chengwu Yu
    Meng Wang
    Shan Chen
    Wanqi Wang
    Weiwei Fang
    Yanming Chen
    Neal N.Xiong
    Journal of Real-Time Image Processing, 2025, 22 (2)
  • [26] Optimizing quay crane scheduling using deep reinforcement learning with hybrid metaheuristic algorithm
    Long, Le Ngoc Bao
    You, Sam-Sang
    Cuong, Truong Ngoc
    Kim, Hwan-Seong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 143
  • [27] DRL-SRS: A Deep Reinforcement Learning Approach for Optimizing Spaced Repetition Scheduling
    Xiao, Qinfeng
    Wang, Jing
    APPLIED SCIENCES-BASEL, 2024, 14 (13):
  • [28] Sparse GPU Kernels for Deep Learning
    Gale, Trevor
    Zaharia, Matei
    Young, Cliff
    Elsen, Erich
    PROCEEDINGS OF SC20: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC20), 2020,
  • [29] Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing
    Luo, Yizhou
    Wang, Qiang
    Shi, Shaohuai
    Lai, Jiaxin
    Qi, Shuhan
    Zhang, Jiajia
    Wang, Xuan
    2024 IEEE/ACM 32ND INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE, IWQOS, 2024,
  • [30] LEARNING TO IMPROVE CONSTRAINT-BASED SCHEDULING
    ZWEBEN, M
    DAVIS, E
    DAUN, B
    DRASCHER, E
    DEALE, M
    ESKEY, M
    ARTIFICIAL INTELLIGENCE, 1992, 58 (1-3) : 271 - 296