Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC

被引:0
|
作者
Zaourar, Lilia [1 ]
Benazouz, Mohamed [1 ]
Mouhagir, Ayoub [1 ]
Falquez, Carlos [2 ]
Portero, Antoni [2 ]
Ho, Nam [2 ]
Suarez, Estela [2 ]
Petrakis, Polydoros [3 ]
Marazakis, Manolis [3 ]
Sgherzi, Francesco [4 ]
Fernandez, Ivan [4 ]
Dolbeau, Romain [5 ]
Pleiter, Dirk [6 ]
机构
[1] Univ Paris Saclay, List, CEA, F-91120 Palaiseau, France
[2] Forschungszentrum Julich, Inst Adv Simulat, Julich Supercomp Ctr, Julich, Germany
[3] Fdn Res & Technol Hellas FORTH, Inst Comp Sci, Iraklion, Greece
[4] Barcelona Supercomp Ctr BSC, Barcelona, Spain
[5] SiPearl, Rennes, France
[6] KTH Royal Inst Technol, Stockholm, Sweden
关键词
Non-Uniform Memory Access (NUMA); co-design; simulation; High Performance Computing (HPC); benchmarking;
D O I
10.1007/978-3-031-66146-4_17
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The memory systems of High-Performance Computing (HPC) systems commonly feature non-uniform data paths to memory, i.e. are non-uniform memory access (NUMA) architectures. Memory is divided into multiple regions, with each processing unit having its own local memory. Therefore, for each processing unit access to local memory regions is faster compared to accessing memory at non-local regions. Architectures with hybrid memory technologies result in further non-uniformity. This paper presents case studies of the performance potential and data placement implications of non-uniform and heterogeneous memory in HPC systems. Using the gem5 and VPSim simulation platforms, we model NUMA systems with processors based on the ARMv8 Neoverse V1 Reference Design. The gem5 simulator provides a cycle-accurate view, while VPSim offers greater simulation speed, with a high-level view of the simulated system. We highlight the performance impact of design trade-offs regarding NUMA node organization and System Level Cache (SLC) group assignment, as well as Networkon-Chip (NoC) configuration. Our case studies provide essential input to a co-design process involving HPC processor architects and system integrators. A comparison of system configurations for different NoC bandwidths shows reduced NoC latency and high memory bandwidth improvement when NUMA control is enabled. Furthermore, a configuration with HBM2 memory organized as four NUMA nodes highlights the memory bandwidth performance gap and NoC queuing latency impact when comparing local vs. remote memory accesses. On the other hand, NUMA can result in an unbalanced distribution of memory accesses and reduced SLC hit ratios, as shown with DDR4 memory organized as four NUMA nodes.
引用
收藏
页码:251 / 265
页数:15
相关论文
共 50 条
  • [1] Exposing the Locality of Heterogeneous Memory Architectures to HPC Applications
    Goglin, Brice
    MEMSYS 2016: PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON MEMORY SYSTEMS, 2016, : 30 - 39
  • [2] Performance and Energy Efficiency Evaluation for HPC Applications in Heterogeneous Architectures
    Kloh, Vinicius
    Yokoyama, Daniel
    Yokoyama, Andre
    Silva, Gabrieli
    Ferro, Mariza
    Schulze, Bruno
    2018 SYMPOSIUM ON HIGH PERFORMANCE COMPUTING SYSTEMS (WSCAD 2018), 2018, : 162 - 169
  • [3] POSTER: Scheduling HPC Workloads on Heterogeneous-ISA Architectures
    Karaoui, Mohamed L.
    Carno, Anthony
    Lyerly, Rob
    Kim, Sang-Hoon
    Olivier, Pierre
    Min, Changwoo
    Ravindran, Binoy
    PROCEEDINGS OF THE 24TH SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '19), 2019, : 409 - 410
  • [4] Heterogeneous- and NUMA-aware Scheduling for Many-core Architectures
    Petrides, Panayiotis
    Trancoso, Pedro
    SYSTOR'17: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, 2017,
  • [5] Data Placement in HPC Architectures with Heterogeneous Off-chip Memory
    Pavlovic, Milan
    Puzovic, Nikola
    Ramirez, Alex
    2013 IEEE 31ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2013, : 193 - 200
  • [6] Islands-of-Cores Approach for Harnessing SMP/NUMA Architectures in Heterogeneous Stencil Computations
    Szustak, Lukasz
    Wyrzykowski, Roman
    Jakl, Ondrej
    PARALLEL COMPUTING TECHNOLOGIES (PACT 2017), 2017, 10421 : 351 - 364
  • [7] Performance Prediction on Heterogeneous Architectures: Challenges and Insights
    Stanzani, Silvio
    2019 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2019, : 747 - 747
  • [8] Challenges of Translating HPC codes to Workflows for Heterogeneous and Dynamic Environments
    Benkhaldoun, Fayssal
    Cerin, Christophe
    Kissami, Imad
    Saad, Walid
    2017 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS), 2017, : 858 - 863
  • [9] Effective Running of End-to-end HPC Workflows on Emerging Heterogeneous Architectures
    Tang, Kun
    Tiwari, Devesh
    Gupta, Saurabh
    Vazhkudai, Sudharshan S.
    He, Xubin
    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 344 - 348
  • [10] Portable exploitation of parallel and heterogeneous HPC architectures in neural simulation using SkePU
    Panagiotou, Sotirios
    Ernstsson, August
    Ahlqvist, Johan
    Papadopoulos, Lazaros
    Kessler, Christoph
    Soudris, Dimitrios
    PROCEEDINGS OF THE 23RD INTERNATIONAL WORKSHOP ON SOFTWARE AND COMPILERS FOR EMBEDDED SYSTEMS (SCOPES 2020), 2020, : 74 - 77