Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC

被引：0

作者：

Zaourar, Lilia ^{[1
]}

Benazouz, Mohamed ^{[1
]}

Mouhagir, Ayoub ^{[1
]}

Falquez, Carlos ^{[2
]}

Portero, Antoni ^{[2
]}

Ho, Nam ^{[2
]}

Suarez, Estela ^{[2
]}

Petrakis, Polydoros ^{[3
]}

Marazakis, Manolis ^{[3
]}

Sgherzi, Francesco ^{[4
]}

Fernandez, Ivan ^{[4
]}

Dolbeau, Romain ^{[5
]}

Pleiter, Dirk ^{[6
]}

机构：

[1] Univ Paris Saclay, List, CEA, F-91120 Palaiseau, France

[2] Forschungszentrum Julich, Inst Adv Simulat, Julich Supercomp Ctr, Julich, Germany

[3] Fdn Res & Technol Hellas FORTH, Inst Comp Sci, Iraklion, Greece

[4] Barcelona Supercomp Ctr BSC, Barcelona, Spain

[5] SiPearl, Rennes, France

[6] KTH Royal Inst Technol, Stockholm, Sweden

来源：

ARCHITECTURE OF COMPUTING SYSTEMS, ARCS 2024 | 2024年 / 14842卷

关键词：

Non-Uniform Memory Access (NUMA); co-design; simulation; High Performance Computing (HPC); benchmarking;

D O I：

10.1007/978-3-031-66146-4_17

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The memory systems of High-Performance Computing (HPC) systems commonly feature non-uniform data paths to memory, i.e. are non-uniform memory access (NUMA) architectures. Memory is divided into multiple regions, with each processing unit having its own local memory. Therefore, for each processing unit access to local memory regions is faster compared to accessing memory at non-local regions. Architectures with hybrid memory technologies result in further non-uniformity. This paper presents case studies of the performance potential and data placement implications of non-uniform and heterogeneous memory in HPC systems. Using the gem5 and VPSim simulation platforms, we model NUMA systems with processors based on the ARMv8 Neoverse V1 Reference Design. The gem5 simulator provides a cycle-accurate view, while VPSim offers greater simulation speed, with a high-level view of the simulated system. We highlight the performance impact of design trade-offs regarding NUMA node organization and System Level Cache (SLC) group assignment, as well as Networkon-Chip (NoC) configuration. Our case studies provide essential input to a co-design process involving HPC processor architects and system integrators. A comparison of system configurations for different NoC bandwidths shows reduced NoC latency and high memory bandwidth improvement when NUMA control is enabled. Furthermore, a configuration with HBM2 memory organized as four NUMA nodes highlights the memory bandwidth performance gap and NoC queuing latency impact when comparing local vs. remote memory accesses. On the other hand, NUMA can result in an unbalanced distribution of memory accesses and reduced SLC hit ratios, as shown with DDR4 memory organized as four NUMA nodes.

引用

页码：251 / 265

页数：15

共 50 条

[1] Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications
Goglin, Brice
MEMSYS 2016: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2016, : 30 - 39
[2] Performance and Energy Efficiency Evaluation for HPC Applications in Heterogeneous Architectures
Kloh, Vinicius
Yokoyama, Daniel
Yokoyama, Andre
Silva, Gabrieli
Ferro, Mariza
Schulze, Bruno
2018 SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (WSCAD 2018), 2018, : 162 - 169
[3] POSTER: Scheduling HPC Workloads on Heterogeneous-ISA Architectures
Karaoui, Mohamed L.
Carno, Anthony
Lyerly, Rob
Kim, Sang-Hoon
Olivier, Pierre
Min, Changwoo
Ravindran, Binoy
PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 409 - 410
[4] Heterogeneous- and NUMA-aware Scheduling for Many-core Architectures
Petrides, Panayiotis
Trancoso, Pedro
SYSTOR'17: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, 2017,
[5] Data Placement in HPC Architectures with Heterogeneous Off-chip Memory
Pavlovic, Milan
Puzovic, Nikola
Ramirez, Alex
2013 IEEE 31ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2013, : 193 - 200
[6] Islands-of-Cores Approach for Harnessing SMP/NUMA Architectures in Heterogeneous Stencil Computations
Szustak, Lukasz
Wyrzykowski, Roman
Jakl, Ondrej
PARALLEL COMPUTING TECHNOLOGIES (PACT 2017), 2017, 10421 : 351 - 364
[7] Performance Prediction on Heterogeneous Architectures: Challenges and Insights
Stanzani, Silvio
2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 747 - 747
[8] Challenges of Translating HPC codes to Workflows for Heterogeneous and Dynamic Environments
Benkhaldoun, Fayssal
Cerin, Christophe
Kissami, Imad
Saad, Walid
2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 858 - 863
[9] Effective Running of End-to-end HPC Workflows on Emerging Heterogeneous Architectures
Tang, Kun
Tiwari, Devesh
Gupta, Saurabh
Vazhkudai, Sudharshan S.
He, Xubin
2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 344 - 348
[10] Portable exploitation of parallel and heterogeneous HPC architectures in neural simulation using SkePU
Panagiotou, Sotirios
Ernstsson, August
Ahlqvist, Johan
Papadopoulos, Lazaros
Kessler, Christoph
Soudris, Dimitrios
PROCEEDINGS OF THE 23RD INTERNATIONAL WORKSHOP ON SOFTWARE AND COMPILERS FOR EMBEDDED SYSTEMS (SCOPES 2020), 2020, : 74 - 77

← 1 2 3 4 5 →