Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors

被引:10
|
作者
Li, Mingzhen [1 ,2 ]
Liu, Yi [2 ]
Yang, Hailong [1 ,2 ]
Hu, Yongmin [2 ]
Sun, Qingxiao [2 ]
Chen, Bangduo [2 ]
You, Xin [2 ]
Liu, Xiaoyan [2 ]
Luan, Zhongzhi [2 ]
Qian, Depei [2 ]
机构
[1] State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Beihang Univ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Stencil; Domain Specific Language; Performance Optimization; Manycore Architecture;
D O I
10.1145/3472456.3473517
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stencil computation is an indispensable building block of many scientific applications and is widely used by the numerical solvers of partial differential equations (PDEs). Due to the complex computation patterns of different stencils and the various hardware targets (e.g., many-core processors), many domain-specific languages (DSLs) have been proposed to optimize stencil computation. However, existing stencil DSLs mostly focus on the performance optimizations on homogeneous many-core processors such as CPUs and GPUs, and fail to embrace emerging heterogeneous many-core processors such as Sunway. In addition, few of them can support expressing stencil with multiple time dependencies and optimizations from both spatial and temporal dimensions. Moreover, most stencil DSLs are unable to generate codes that can run efficiently in large scale, which limits their practical applicability. In this paper, we propose MSC, a new stencil DSL designed to express stencil computation in both spatial and temporal dimensions. It can generate high-performance stencil codes for large-scale execution on emerging many-core processors. Specially, we design several optimization primitives for improving parallelism and data locality, and a communication library for efficient halo exchange in large scale execution. The experiment results show that our MSC achieves better performance compared to the state-of-the-art stencil DSLs.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] CINOC: Computing in Network-On-Chip With Tiled Many-Core Architectures for Large-Scale General Matrix Multiplications
    Qin, Yao
    Wang, Mingyu
    Yan, Jiahua
    Lu, Tao
    Yu, Zhiyi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2024,
  • [42] Towards optimized tensor code generation for deep learning on sunway many-core processor
    Mingzhen Li
    Changxi Liu
    Jianjin Liao
    Xuegui Zheng
    Hailong Yang
    Rujun Sun
    Jun Xu
    Lin Gan
    Guangwen Yang
    Zhongzhi Luan
    Depei Qian
    Frontiers of Computer Science, 2024, 18
  • [43] Acceleration of large-scale multi-physics simulation for biomedical EMC with many-core architecture based computing
    Suzuki, Y.
    Sasaki, M.
    Onishi, S.
    Imai, R.
    Takamura, M.
    Taki, M.
    Chakarothai, J.
    Sasaki, K.
    Wake, K.
    Watanabe, S.
    Kojima, M.
    Tsai, C-Y.
    Sasaki, H.
    2015 1st URSI Atlantic Radio Science Conference (URSI AT-RASC), 2015,
  • [44] Many-Core CPUs Can Deliver Scalable Performance to Stochastic Simulations of Large-Scale Biochemical Reaction Networks
    Kouskoumvekakis, Elias
    Soudris, Dimitrios
    Manolakos, Elias S.
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2015), 2015, : 517 - 524
  • [45] Automatic SoC Design Flow on Many-core Processors: a Software Hardware Co-Design Approach for FPGAs
    Liu, Ling
    Morozov, Oleksii
    Han, Yuxing
    Gutknecht, Juerg
    Hunziker, Patrick
    FPGA 11: PROCEEDINGS OF THE 2011 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD PROGRAMMABLE GATE ARRAYS, 2011, : 37 - 40
  • [46] Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors
    Nagasaka, Yusuke
    Matsuoka, Satoshi
    Azad, Ariful
    Buluc, Aydin
    PARALLEL COMPUTING, 2019, 90
  • [47] SWIRL: High-performance many-core CPU code generation for deep neural networks
    Venkat, Anand
    Rusira, Tharindu
    Barik, Raj
    Hall, Mary
    Truong, Leonard
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (06): : 1275 - 1289
  • [48] An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture
    Zhang, Libo
    Mao, Xingquan
    You, Hongtao
    Gu, Long
    Jiang, Xiaocheng
    CCF TRANSACTIONS ON HIGH PERFORMANCE COMPUTING, 2020, 2 (04) : 323 - 331
  • [49] An automatic mapping technique for OpenACC kernel code based on deeply fused and heterogeneous many-core architecture
    Libo Zhang
    Xingquan Mao
    Hongtao You
    Long Gu
    Xiaocheng Jiang
    CCF Transactions on High Performance Computing, 2020, 2 : 323 - 331
  • [50] "Swimming pool"- like distributed architecture for clock generation in large many-core SoC
    Shan, Chuan
    Anceau, Francois
    Galayko, Dimitri
    Zianbetov, Eldar
    2014 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2014, : 2768 - 2771