Automatic Optimization Heuristics Method for OpenCL Program Based on Graph Neural Network

被引:0
|
作者
Ye G. [1 ]
Zhang Y. [1 ]
Zhang C. [1 ]
Zhao J. [1 ]
Wang H. [1 ]
机构
[1] School of Information Science and Technology, Northwest University, Xi’an 710127)(Shaanxi International Joint Research Centre for the Battery-free Internet of Things(Northwest University), Xi’an 710127
基金
中国国家自然科学基金;
关键词
deep learning; graph network; heterogeneous device; heuristic optimization; OpenCL;
D O I
10.7544/issn1000-1239.202110943
中图分类号
学科分类号
摘要
The last decade years witnessed the rapid development of heterogeneous computer architecture due to the popularization of the Internet of things. As the first cross-platform heterogeneous parallel computing framework, OpenCL(open computing language)has the advantages of standardization and portability. However, OpenCL has certain defects in performance portability because of the complexity and diversity of software and hardware platforms. To address this problem, prior methods leverage deep learning to build an optimization model. But they suffer from an insignificant code optimization effect because existing deep learning-based methods only capture the order dependencies of the program while ignoring the syntactic and semantic information. To this end, we propose ACCEPT, an automated heuristic optimization on OpenCL programs by building a multi-relational graph neural network. Differ from existing methods, ACCEPT first extracts the deep structure and semantic features of the OpenCL program by constructing a multi-relational code graph, then applies an improved graph neural network to extract the high-dimensional feature representation of the constructed code graph. Finally, a decision neural network is used to yield the optimization parameters. We evaluate ACCEPT with heterogeneous device mapping and thread coarsening factor prediction tasks. The experimental results show that ACCEPT outperforms state-of-the-art (SOTA) methods. On the heterogeneous device mapping task, the accuracy can reach 88.7%, and the speedup can be increased by 7.6% compared with the SOTA methods. On the thread coarsening task, the speedup is 5.2% higher than SOTA methods. © 2023 Science Press. All rights reserved.
引用
收藏
页码:1121 / 1135
页数:14
相关论文
共 39 条
  • [1] Dietze R, Runger G., The search-based scheduling algorithm HP* for parallel tasks on heterogeneous platforms, Concurrency and Computation: Practice and Experience, 32, 21, (2020)
  • [2] OpenCL
  • [3] Pennycook S J, Hammond S D, Wright S A, Et al., An investigation of the performance portability of OpenCL[J], Journal of Parallel and Distributed Computing, 73, 11, pp. 1439-1450, (2013)
  • [4] Grewe D, Wang Zheng, O'Boyle M F P., Portable mapping of data parallel programs to OpenCL for heterogeneous systems, Proc of the 11th IEEE/ACM Int Symp on Code Generation and Optimization (CGO), (2013)
  • [5] Balasalle J, Lopez M A, Rutherford M J., Optimizing Memory Access Patterns for Cellular Automata on GPUs[M], GPU Computing Gems Jade Edition, pp. 67-75, (2012)
  • [6] Shen Yuan, Yan Hanbing, Xia Chunhe, Et al., A novel method for malware clone detection based on deep learning[J/OL], Journal of Beijing University of Aeronautics and Astronautics, (2021)
  • [7] Cummins C, Petoumenos P, Murray A, Et al., Compiler fuzzing through deep learning[C], Proc of the 27th ACM SIGSOFT Int Symp on Software Testing and Analysis, pp. 95-105, (2018)
  • [8] Ruoqin Lin, Qiong Luo, Software vulnerability detection algorithm based on deformable convolutional neural network[J], Computer Integrated Manufacturing Systems, 38, 3, (2021)
  • [9] Cummins C, Petoumenos P, Wang Zheng, Et al., End-to-end deep learning of optimization heuristics[C], Proc of the 26th Int Conf on Parallel Architectures and Compilation Techniques (PACT), pp. 219-232, (2017)
  • [10] Tianqi Chen, Moreau T, Jiang Ziheng, Et al., TVM: An automated end-to-end optimizing compiler for deep learning[C], Proc of the 13th USENIX Symp on Operating Systems Design and Implementation (OSDI’18), pp. 578-594, (2018)