Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

被引:1
|
作者
Lin, Pei-Hung [1 ]
Yi, Qing [2 ]
Quinlan, Daniel [1 ]
Liao, Chunhua [1 ]
Yan, Yongqing [2 ]
机构
[1] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA
[2] Univ Colorado, Colorado Springs, CO 80918 USA
关键词
D O I
10.1007/978-3-319-52709-3_12
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents a system for automatically supporting the optimization of stencil kernels on emerging Non-Uniform Memory Access (NUMA) many-core architectures, through a combined compiler + runtime approach. In particular, we use a pragma-driven compiler to recognize the special structures and optimization needs of stencil computations and thereby to automatically generate low-level code that efficiently utilize the data placement and management support of a C++ runtime on top of NUMA API, a programming interface to the NUMA policy supported by the Linux kernel. Our results show that through automated specialization of code generation, this approach provides a combined benefit of performance, portability, and productivity for developers.
引用
收藏
页码:137 / 152
页数:16
相关论文
共 50 条
  • [1] NUMA Aware Iterative Stencil Computations on Many-Core Systems
    Shaheen, Mohammed
    Strzodka, Robert
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 461 - 473
  • [2] Vectorizing unstructured mesh computations for many-core architectures
    Reguly, Istvan Z.
    Laszlo, Endre
    Mudalige, Gihan R.
    Giles, Mike B.
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02): : 557 - 577
  • [3] Architectural Support for Cilk Computations on Many-core Architectures
    Long, Guoping
    Fan, Dongrui
    Zhang, Junchao
    [J]. ACM SIGPLAN NOTICES, 2009, 44 (04) : 285 - 286
  • [4] Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures
    Al Farhan, Mohammed A.
    Keyes, David E.
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (10) : 2317 - 2332
  • [5] Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures
    Gan, Lin
    Fu, Haohuan
    Xue, Wei
    Xu, Yangtong
    Yang, Chao
    Wang, Xinliang
    Lv, Zihong
    You, Yang
    Yang, Guangwen
    Ou, Kaijian
    [J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 103 - 110
  • [6] Heterogeneous- and NUMA-aware Scheduling for Many-core Architectures
    Petrides, Panayiotis
    Trancoso, Pedro
    [J]. SYSTOR'17: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, 2017,
  • [7] Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures
    Zhang, Peng
    Fang, Jianbin
    Yang, Canqun
    Huang, Chun
    Tang, Tao
    Wang, Zheng
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1878 - 1896
  • [8] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
    Zhang, Kaifang
    Su, Huayou
    Dou, Yong
    [J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13584 - 13600
  • [9] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
    Kaifang Zhang
    Huayou Su
    Yong Dou
    [J]. The Journal of Supercomputing, 2021, 77 : 13584 - 13600
  • [10] Optimizing the performance of reactive molecular dynamics simulations for many-core architectures
    Aktulga, Hasan Metin
    Knight, Chris
    Coffman, Paul
    O'Hearn, Kurt A.
    Shan, Tzu-Ray
    Jiang, Wei
    [J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (02): : 304 - 321