GPU Auto-tuning Framework for Optimal Performance and Power Consumption

被引:0
|
作者
Cheema, Sunbal [1 ]
Khan, Gul N. [1 ]
机构
[1] Toronto Metropolitan Univ, Dept Elect Comp & Biomed Engn, Toronto, ON, Canada
关键词
Auto-tuning; Code transformation; Multi-objective optimization; GPU code regeneration; Performance power optimization; EFFICIENCY;
D O I
10.1145/3589236.3589241
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
An auto-tuning framework for GPU devices is presented for tuning application kernels of OpenCL. The GPU tuner employs multi-objective optimization methodology to improve the performance and power consumption of applications. It efficiently explores a user defined solution space comprising of possible tunable algorithmic and hardware counter variations through code transformations. The methodology targets GPU code tuning situations where performance and energy consumption are critical. The proposed framework is evaluated for 2D convolution kernels. It utilizes a non-dominated sorting Genetic Algorithm with hardware power sensor data for application code transformation through code rewrite and validation. Various algorithmic variations such as loop unrolling, caching, workgroup size and memory utilization are applied. The final pareto optimal configurations code utilized around 30% less power and 4% faster execution time. The analysis shows the convergence of optimization, and 45% improvement in standard deviation.
引用
收藏
页码:1 / 6
页数:6
相关论文
共 50 条
  • [1] Bayesian Optimization for auto-tuning GPU kernels
    Willemsen, Floris-Jan
    van Nieuwpoort, Rob
    van Werkhoven, Ben
    PROCEEDINGS OF PERFORMANCE MODELING, BENCHMARKING AND SIMULATION OF HIGH PERFORMANCE COMPUTER SYSTEMS (PMBS 2021), 2021, : 106 - 117
  • [2] Optimizing and Auto-tuning Belief Propagation on the GPU
    Grauer-Gray, Scott
    Cavazos, John
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2011, 6548 : 121 - 135
  • [3] Toward Techniques for Auto-tuning GPU Algorithms
    Davidson, Andrew
    Owens, John
    APPLIED PARALLEL AND SCIENTIFIC COMPUTING, PT II, 2012, 7134 : 110 - 119
  • [4] Adaptive GPU Array Layout Auto-Tuning
    Weber, Nicolas
    Goesele, Michael
    PROCEEDINGS OF THE ACM WORKSHOP ON SOFTWARE ENGINEERING METHODS FOR PARALLEL AND HIGH PERFORMANCE APPLICATIONS (SEM4HPC'16), 2016, : 21 - 28
  • [5] A History-Based Auto-Tuning Framework for Fast and High-Performance DNN Design on GPU
    Mu, Jiandong
    Wang, Mengdi
    Li, Lanbo
    Yang, Jun
    Lin, Wei
    Zhang, Wei
    PROCEEDINGS OF THE 2020 57TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2020,
  • [6] ATF: A Generic Auto-Tuning Framework
    Rasch, Ari
    Haidl, Michael
    Gorlatch, Sergei
    2017 19TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS (HPCC) / 2017 15TH IEEE INTERNATIONAL CONFERENCE ON SMART CITY (SMARTCITY) / 2017 3RD IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (DSS), 2017, : 64 - 71
  • [7] ATF: A Generic Auto-Tuning Framework
    Rasch, Ari
    Gorlatch, Sergei
    HPDC '18: PROCEEDINGS OF THE 27TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING: POSTERS/DOCTORAL CONSORTIUM, 2018, : 3 - 4
  • [8] Meta-programming and Auto-tuning in the Search for High Performance GPU Code
    Vollmer, Michael
    Svensson, Bo Joel
    Holk, Eric
    Newton, Ryan R.
    FHPC'15 PROCEEDINGS OF THE 4TH ACM SIGPLAN WORKSHOP ON FUNCTIONAL HIGH-PERFORMANCE COMPUTING, 2015, : 1 - 11
  • [9] Testing and Auto-Tuning GPU code with Kernel Tuner
    van Werkhoven, Ben
    2019 18TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC 2019), 2019, : XXI - XXI
  • [10] Efficient Auto-Tuning of Parallel Programs with Interdependent Tuning Parameters via Auto-Tuning Framework (ATF)
    Rasch, Ari
    Schulze, Richard
    Steuwer, Michel
    Gorlatch, Sergei
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2021, 18 (01)