UPPA: Unified Parallel Programming Architecture for Heterogeneous Systems

被引:0
|
作者
Wu S.-S. [1 ]
Dong X.-S. [1 ]
Wang Y.-F. [1 ]
Wang L.-X. [1 ]
Zhu Z.-D. [1 ]
机构
[1] School of Computer Science and Engineering, Xi'an Jiaotong University, Xi'an
来源
基金
中国国家自然科学基金;
关键词
Data associated computation; Heterogeneous parallel programming; OpenCL; Parallel programming model; Unified programming architecture;
D O I
10.11897/SP.J.1016.2020.00990
中图分类号
学科分类号
摘要
Mainstream heterogeneous parallel programming methods such as CUDA and OpenCL provide close-to-mental programming interface and present low-level programming abstraction. They merely shield the underlying hardware and runtime details for heterogeneous developers, resulting in complicated programming logic and making heterogeneous programming difficult and error-prone. Meanwhile, the performance of an application developed with low level programming methods is bound to specific runtime environment. As a result, high-level applications cannot execute on different hardware or perform poorly after been ported to other heterogeneous systems. Specific and manual modification and optimization to the application according to the hardware features are essential when the hardware architecture changes. High-level applications cannot maintain unified and lack cross-platform feature. In order to simplify heterogeneous parallel programming, improve programming productivity, and achieve the goal of producing unified and cross-platform high-level applications, this paper presents UPPA, a Unified Parallel Programming Architecture for heterogeneous many-core systems. Firstly, the UPPA proposes data associated computation (DAC) programming model, which realizes unified description of parallelism of different levels and different patterns. The DAC model provides a high-level unified parallel programming abstraction and simplifies the heterogeneous parallel programming logic. Secondly, the UPPA provides a unified programming interface for the developers with the DAC description language. The DAC description language implements the DAC model with language extensions. High-level semantic structures are designed to preserve the parallel features of the application and guide the compilation and runtime system to conduct automatic mapping of high-level applications onto different hardware architectures, saving programming effort while keeping high-level applications unified. What is more, the DAC description language adopts C-like syntax for the language extensions that implements these high-level semantic structures, ensuring the easy-to-learn and easy-to-use features of the programming interface. Finally, a prototype system which is consisting of a source-to-source compiler and runtime support is implemented on the top of OpenCL. The runtime system encapsulates OpenCL runtime APIs with runtime library functions. Based on these library functions, the source-to-source compiler generates standard OpenCL code from the application developed with the DAC description language. Using OpenCL as an intermediate language, the compiler and runtime system achieves efficient execution of high-level applications on different heterogeneous systems, providing a fine cross-platform feature. We rebuilt multiple benchmarks which are selected from the Parboil benchmark suite and the Rodinia benchmark suite with the DAC description language and conducted experimental tests on both a NVIDIA GPU and an Intel MIC platforms. The code size of each rebuilt benchmark is roughly equivalent to that of the serial code provided by the corresponding benchmark suite, which is only 13% to 64% of the original benchmark OpenCL code, reducing the workload of heterogeneous programming significantly. With the support of the compile and runtime systems, the rebuilt benchmarks can execute on both platforms smoothly without modification. The rebuilt benchmarks yield 91% to 100% and 76% to 98% of the performance of the handcrafted and optimized benchmark OpenCL code on the GPU platform and the MIC platform, respectively. That demonstrates the effectiveness of the UPPA method and the efficiency of the compiler and runtime system. © 2020, Science Press. All right reserved.
引用
收藏
页码:990 / 1009
页数:19
相关论文
共 23 条
  • [1] Liu Ying, Lu Fang, Wang Lei, Et al., Research on heterogeneous parallel programming model, Journal of Software, 25, 7, pp. 1459-1475, (2014)
  • [2] Cheng Hua, Wang Li-Sheng, Xia Zhi-Yu, Jia Jia-Tao, High-productive parallel programming: Challenges and evaluation, Proceedings of the National Annual Conference on High Performance Computing 2013, pp. 839-846, (2013)
  • [3] Amarasinghe S, Hall M, Lethin R, Et al., Exascale programming challenges, Proceedings of the Workshop on Exascale Programming Challenges, pp. 1-65, (2011)
  • [4] Diaz J, Munoz-Caro C, Nino A., A survey of parallel programming models and tools in the multi and many-core era, IEEE Transactions on Parallel&Distributed Systems, 23, 8, pp. 1369-1386, (2012)
  • [5] Martineau M, Mcintosh-Smith S, Gaudin W, Et al., An evaluation of emerging many-core parallel programming models, Proceedings of the 7th International Workshop on Programming Models and Applications for Multicores and Manycores(PMAM'16), pp. 1-10, (2016)
  • [6] Schnetter E, Raiskila K, Takala J, Et al., Pocl: A performance-portable OpenCL implementation, International Journal of Parallel Programming, 43, 5, pp. 752-785, (2015)
  • [7] Auerbach J, Bacon D F, Cheng P, Et al., Lime: A Java-compatible and synthesizable language for heterogeneous architectures, Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications(OOPSLA'10), pp. 89-108, (2010)
  • [8] Steuwer M, Remmelg T, Dubach C., Lift: A functional data-parallel IR for high-performance GPU code generation, Proceedings of the 2017 International Symposium on Code Generation and Optimization(CGO' 17), pp. 74-85, (2017)
  • [9] Chen Y, Cui X, Mei H., PARRAY: A unifying array representation for heterogeneous parallelism, Proceedings of the 17th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming(PPoPP' 12), pp. 171-180, (2012)
  • [10] Catanzaro B, Garland M, Keutzer K., Copperhead: Compiling an embedded data parallel language, Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming(PPoPP' 11), pp. 47-56, (2011)