Mitigating the NUMA effect on task-based runtime systems

被引:1
|
作者
Maronas, Marcos [1 ,2 ]
Navarro, Antoni [1 ,2 ]
Ayguade, Eduard [1 ,2 ]
Beltran, Vicenc [1 ]
机构
[1] Barcelona Supercomp Ctr, Barcelona, Spain
[2] Univ Politecn Cataluna, Barcelona, Spain
来源
JOURNAL OF SUPERCOMPUTING | 2023年 / 79卷 / 13期
关键词
NUMA-awareness; OmpSs-2; Parallel programming model; Scheduling; Task-aware;
D O I
10.1007/s11227-023-05164-9
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Processors with multiple sockets or chiplets are becoming more conventional. These kinds of processors usually expose a single shared address space. However, due to hardware restrictions, they adopt a NUMA approach, where each processor accesses local memory faster than remote memories. Reducing data motion is crucial to improve the overall performance. Thus, computations must run as close as possible to where the data resides. We propose a new approach that mitigates the NUMA effect on NUMA systems. Our solution is based on the OmpSs-2 programming model, a task-based parallel programming model, similar to OpenMP. We first provide a simple API to allocate memory in NUMA systems using different policies. Then, combining user-given information that specifies dependences between tasks, and information collected in a global directory when allocating data, we extend our runtime library to perform NUMA-aware work scheduling. Our heuristic considers data location, distance between NUMA nodes, and the load of each NUMA node to seamlessly minimize data motion costs and load imbalance. Our evaluation shows that our NUMA support can significantly mitigate the NUMA effect by reducing the amount of remote accesses, and so improving performance on most benchmarks, reaching up to 2x speedup in a 2-NUMA machine, and up to 7.1x in a 8-NUMA machine.
引用
收藏
页码:14287 / 14312
页数:26
相关论文
共 50 条
  • [1] Mitigating the NUMA effect on task-based runtime systems
    Marcos Maroñas
    Antoni Navarro
    Eduard Ayguadé
    Vicenç Beltran
    [J]. The Journal of Supercomputing, 2023, 79 : 14287 - 14312
  • [2] Fast approximation algorithms for task-based runtime systems
    Beaumont, Olivier
    Eyraud-Dubois, Lionel
    Kumar, Suraj
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (17):
  • [3] Assembly Operations for Multicore Architectures Using Task-Based Runtime Systems
    Genet, Damien
    Guermouche, Abdou
    Bosilca, George
    [J]. EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT II, 2014, 8806 : 338 - 350
  • [4] Asynchronous Execution of Python']Python Code on Task-Based Runtime Systems
    Tohid, R.
    Wagle, Bibek
    Shirzad, Shahrzad
    Diehl, Patrick
    Serio, Adrian
    Kheirkhahan, Alireza
    Amini, Parsa
    Williams, Katy
    Isaacs, Kate
    Huck, Kevin
    Brandt, Steven
    Kaiser, Hartmut
    [J]. PROCEEDINGS OF 2018 IEEE/ACM 4TH INTERNATIONAL WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2 2018), 2018, : 37 - 45
  • [5] Tracing task-based runtime systems: Feedbacks from the StarPU case
    Denis, Alexandre
    Jeannot, Emmanuel
    Swartvagher, Philippe
    Thibault, Samuel
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (03):
  • [6] An On-Line Performance Introspection Framework for Task-Based Runtime Systems
    Aguilar, Xavier
    Jordan, Herbert
    Heller, Thomas
    Hirsch, Alexander
    Fahringer, Thomas
    Laure, Erwin
    [J]. COMPUTATIONAL SCIENCE - ICCS 2019, PT I, 2019, 11536 : 238 - 252
  • [7] A Hardware Runtime for Task-Based Programming Models
    Tan, Xubin
    Bosch, Jaume
    Alvarez, Carlos
    Jimenez-Gonzalez, Daniel
    Ayguade, Eduard
    Valero, Mateo
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2019, 30 (09) : 1932 - 1946
  • [8] Increasing the degree of parallelism using speculative execution in task-based runtime systems
    Bramas, Berenger
    [J]. PEERJ COMPUTER SCIENCE, 2019, 2019 (03)
  • [9] sOMP: Simulating OpenMP Task-Based Applications with NUMA Effects
    Daoudi, Idriss
    Virouleau, Philippe
    Gautier, Thierry
    Thibault, Samuel
    Aumage, Olivier
    [J]. OPENMP: PORTABLE MULTI-LEVEL PARALLELISM ON MODERN SYSTEMS, 2020, 12295 : 197 - 211
  • [10] Implementing the Broadcast Operation in a Distributed Task-based Runtime
    Ceccato, Rodrigo
    Yviquel, Herve
    Pereira, Marcio
    Souza, Alan
    Araujo, Guido
    [J]. 2022 IEEE 34TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING WORKSHOPS (SBAC-PADW 2022), 2022, : 25 - 32