Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors

被引:10
|
作者
Li, Mingzhen [1 ,2 ]
Liu, Yi [2 ]
Yang, Hailong [1 ,2 ]
Hu, Yongmin [2 ]
Sun, Qingxiao [2 ]
Chen, Bangduo [2 ]
You, Xin [2 ]
Liu, Xiaoyan [2 ]
Luan, Zhongzhi [2 ]
Qian, Depei [2 ]
机构
[1] State Key Lab Software Dev Environm, Beijing, Peoples R China
[2] Beihang Univ, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Stencil; Domain Specific Language; Performance Optimization; Manycore Architecture;
D O I
10.1145/3472456.3473517
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stencil computation is an indispensable building block of many scientific applications and is widely used by the numerical solvers of partial differential equations (PDEs). Due to the complex computation patterns of different stencils and the various hardware targets (e.g., many-core processors), many domain-specific languages (DSLs) have been proposed to optimize stencil computation. However, existing stencil DSLs mostly focus on the performance optimizations on homogeneous many-core processors such as CPUs and GPUs, and fail to embrace emerging heterogeneous many-core processors such as Sunway. In addition, few of them can support expressing stencil with multiple time dependencies and optimizations from both spatial and temporal dimensions. Moreover, most stencil DSLs are unable to generate codes that can run efficiently in large scale, which limits their practical applicability. In this paper, we propose MSC, a new stencil DSL designed to express stencil computation in both spatial and temporal dimensions. It can generate high-performance stencil codes for large-scale execution on emerging many-core processors. Specially, we design several optimization primitives for improving parallelism and data locality, and a communication library for efficient halo exchange in large scale execution. The experiment results show that our MSC achieves better performance compared to the state-of-the-art stencil DSLs.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Large-scale Parallel Design for Cryo-EM Structure Determination on Heterogeneous Many-core Architectures
    Qiao, Liang
    Yu, Hongkun
    Wang, Kunpeng
    Sun, Ruixin
    Zhao, Wenlai
    Fu, Haohuan
    Yang, Guangwen
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 711 - 716
  • [32] Performance Assessment of Hybrid Parallelism for Large-Scale Reservoir Simulation on Multi- and Many-core Architectures
    AlOnazi, Amani
    Rogowski, Marcin
    Al-Zawawi, Ahmed
    Keyes, David
    2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [33] A reconfigurable distributed architecture for clock generation in large many-core SoC
    Shan, Chuan
    Galayko, Dimitri
    Anceau, Francois
    Zianbetov, Eldar
    2014 9TH INTERNATIONAL SYMPOSIUM ON RECONFIGURABLE AND COMMUNICATION-CENTRIC SYSTEMS-ON-CHIP (RECOSOC), 2014,
  • [34] Optimization of Burn-in Test for Many-core processors through Adaptive Spatiotemporal Power Migration
    Cho, Minki
    Sathe, Nikhil
    Raychowdhury, Arijit
    Mukhopadhyay, Saibal
    INTERNATIONAL TEST CONFERENCE 2010, 2010,
  • [35] Arbitrarily Parallelizable Code: A Model of Computation Evaluated on a Message-Passing Many-Core System
    Cook, Sebastien
    Garcia, Paulo
    COMPUTERS, 2022, 11 (11)
  • [36] Design and Optimization of Parallel Algorithm for Kalman Filter on SW26010 Many-Core Processors
    Yang, Aiqiang
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (04)
  • [37] ASPaS: A Framework for Automatic SIMDization of Parallel Sorting on x86-based Many-core Processors
    Hou, Kaixi
    Wang, Hao
    Feng, Wu-chun
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS'15), 2015, : 383 - 392
  • [38] Implementation and optimization of a data protecting model on the Sunway TaihuLight supercomputer with heterogeneous many-core processors
    Chen, Yuedan
    Li, Kenli
    Fei, Xiongwei
    Quan, Zhe
    Li, Keqin
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (21):
  • [39] Automatic Index Selection for Large-Scale Datalog Computation
    Subotic, Pavle
    Jordan, Herbert
    Chang, Lijun
    Fekete, Alan
    Scholz, Bernhard
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 12 (02): : 141 - 153
  • [40] Towards optimized tensor code generation for deep learning on sunway many-core processor
    Li, Mingzhen
    Liu, Changxi
    Liao, Jianjin
    Zheng, Xuegui
    Yang, Hailong
    Sun, Rujun
    Xu, Jun
    Gan, Lin
    Yang, Guangwen
    Luan, Zhongzhi
    Qian, Depei
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (02)