Islands-of-Cores Approach for Harnessing SMP/NUMA Architectures in Heterogeneous Stencil Computations

被引：4

作者：

Szustak, Lukasz ^{[1
]}

Wyrzykowski, Roman ^{[1
]}

Jakl, Ondrej ^{[2
]}

机构：

[1] Czestochowa Tech Univ, Dabrowskiego 69, PL-42201 Czestochowa, Poland

[2] Czech Acad Sci, Inst Geon, Studentska 1768, Ostrava 70800, Czech Republic

来源：

PARALLEL COMPUTING TECHNOLOGIES (PACT 2017) | 2017年 / 10421卷

关键词：

MODEL; EULAG; PHI;

D O I：

10.1007/978-3-319-62932-2_34

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

SMP/NUMA systems are powerful HPC platforms which could be applied for a wide range of real-life applications. These systems provide large capacity of shared memory, and allow using the shared-variable programming model to take advantages of shared memory for inter-process communications and synchronizations. However, as data can be physically dispersed over many nodes, the access to various data items may require significantly different times. In this paper, we face the challenge of harnessing the heterogeneous nature of SMP/NUMA communications for a complex scientific application which implements the Multidimensional Positive Definite Advection Transport Algorithm (MPDATA), consisting of a set of heterogeneous stencil computations. When using our method of MPDATA workload distribution, which was successfully applied for small-scale shared memory systems with several CPUs and/or accelerators, significant performance losses are noticeable for larger SMP/NUMA systems, such as SGI UV 2000 server used in this work. To overcome this shortcoming, we propose a new islands-of-cores approach. It exposes a correlation between computation and communication for heterogeneous stencils, and enables an efficient management of trade-off between computation and communication costs in accordance with the features of SMP/NUMA systems. In consequence, when using the maximum configuration with 112 cores of 14 Intel Xeon E5-4627v2 3.3 GHz processors, the proposed approach accelerates the previous method more then 10 times, achieving about 390 Gflop/s, or approximately 30% of the theoretical peak performance.

引用

页码：351 / 364

页数：14

共 5 条

[1] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
Zhang, Kaifang
Su, Huayou
Dou, Yong
[J]. JOURNAL OF SUPERCOMPUTING, 2021, 77 (11): : 13584 - 13600
[2] Multilevel parallelism optimization of stencil computations on SIMDlized NUMA architectures
Kaifang Zhang
Huayou Su
Yong Dou
[J]. The Journal of Supercomputing, 2021, 77 : 13584 - 13600
[3] Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures
Lin, Pei-Hung
Yi, Qing
Quinlan, Daniel
Liao, Chunhua
Yan, Yongqing
[J]. LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2016, 2017, 10136 : 137 - 152
[4] Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations
Szustak, Lukasz
Halbiniak, Kamil
Wyrzykowski, Roman
Jakl, Ondrej
[J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (12): : 7765 - 7777
[5] Unleashing the performance of ccNUMA multiprocessor architectures in heterogeneous stencil computations
Lukasz Szustak
Kamil Halbiniak
Roman Wyrzykowski
Ondřej Jakl
[J]. The Journal of Supercomputing, 2019, 75 : 7765 - 7777

← 1 →