Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures

被引：1

作者：

Lin, Pei-Hung ^{[1
]}

Yi, Qing ^{[2
]}

Quinlan, Daniel ^{[1
]}

Liao, Chunhua ^{[1
]}

Yan, Yongqing ^{[2
]}

机构：

[1] Lawrence Livermore Natl Lab, Livermore, CA 94550 USA

[2] Univ Colorado, Colorado Springs, CO 80918 USA

来源：

LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2016 | 2017年 / 10136卷

关键词：

D O I：

10.1007/978-3-319-52709-3_12

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper presents a system for automatically supporting the optimization of stencil kernels on emerging Non-Uniform Memory Access (NUMA) many-core architectures, through a combined compiler + runtime approach. In particular, we use a pragma-driven compiler to recognize the special structures and optimization needs of stencil computations and thereby to automatically generate low-level code that efficiently utilize the data placement and management support of a C++ runtime on top of NUMA API, a programming interface to the NUMA policy supported by the Linux kernel. Our results show that through automated specialization of code generation, this approach provides a combined benefit of performance, portability, and productivity for developers.

引用

页码：137 / 152

页数：16

共 50 条

[1] NUMA Aware Iterative Stencil Computations on Many-Core Systems
Shaheen, Mohammed
Strzodka, Robert
[J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2012, : 461 - 473
[2] Vectorizing unstructured mesh computations for many-core architectures
Reguly, Istvan Z.
Laszlo, Endre
Mudalige, Gihan R.
Giles, Mike B.
[J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (02): : 557 - 577
[3] Architectural Support for Cilk Computations on Many-core Architectures
Long, Guoping
Fan, Dongrui
Zhang, Junchao
[J]. ACM SIGPLAN NOTICES, 2009, 44 (04) : 285 - 286
[4] Optimizations of Unstructured Aerodynamics Computations for Many-core Architectures
Al Farhan, Mohammed A.
Keyes, David E.
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (10) : 2317 - 2332
[5] Scaling and Analyzing the Stencil Performance on Multi-Core and Many-Core Architectures
Gan, Lin
Fu, Haohuan
Xue, Wei
Xu, Yangtong
Yang, Chao
Wang, Xinliang
Lv, Zihong
You, Yang
Yang, Guangwen
Ou, Kaijian
[J]. 2014 20TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2014, : 103 - 110
[6] Heterogeneous- and NUMA-aware Scheduling for Many-core Architectures
Petrides, Panayiotis
Trancoso, Pedro
[J]. SYSTOR'17: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, 2017,
[7] Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures
Zhang, Peng
Fang, Jianbin
Yang, Canqun
Huang, Chun
Tang, Tao
Wang, Zheng
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1878 - 1896
[8] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
Zhang, Kaifang
Su, Huayou
Dou, Yong
[J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13584 - 13600
[9] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
Kaifang Zhang
Huayou Su
Yong Dou
[J]. The Journal of Supercomputing, 2021, 77 : 13584 - 13600
[10] Optimizing the performance of reactive molecular dynamics simulations for many-core architectures
Aktulga, Hasan Metin
Knight, Chris
Coffman, Paul
O'Hearn, Kurt A.
Shan, Tzu-Ray
Jiang, Wei
[J]. INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (02): : 304 - 321

← 1 2 3 4 5 →