Accelerating LULESH using HPX - the C plus plus Standard Library for Parallelism and Concurrency

被引:0
|
作者
Singanaboina, Srinivas Yadav [1 ]
Wei, Weile [2 ]
Seiras, Isidoros Tsaousis [3 ]
Syskakis, Panagiotis [3 ]
Richardson, Bradley [4 ]
Cook, Brandon [4 ]
Kaiser, Hartmut [1 ]
机构
[1] Louisiana State Univ, Baton Rouge, LA 70803 USA
[2] STE ARGrp, Sunnyvale, CA USA
[3] Aristotle Univ Thessaloniki, Thessaloniki, Greece
[4] Natl Energy Res Sci Comp Ctr, Berkeley, CA USA
基金
美国国家科学基金会;
关键词
C plus; HPX; LULESH; OpenMP;
D O I
10.1145/3626203.3670529
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The rapid increase in computer hardware capabilities demands the need for greater parallel efficiency. While the existing CPU parallelization techniques using OpenMP provide adequate performance on parallelizing intensive scientific compute workloads, they lack user flexibility in controlling parallelism, which can enhance performance. In this paper we describe a new port of Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics(LULESH), a widely studied benchmark in the Department of Energy co-design efforts for ExaScale computing, to HPX and HPXMP (a new OpenMP backend of HPX), and an evaluation of the performance on Intel, AMD, ARM, and RISC-V architectures. We explain how we progressively incorporated HPX's parallel features to boost the performance of an HPX-enabled LULESH, including using parallel execution policy, hpx::for_loop, and HPX's fork-join executor. We compare the shared memory parallelism performance of five LULESH versions: NVC++ implementation using C++ algorithms, OpenMP pragma-based implementation, HPX implementation using C++ algorithms, HPX using fork-join executors, and HPXMP. Using HPXMP we observed an average speedup of 1.4.., 1.5.., 1.2.., and 1.3.. relative to OpenMP on Intel, AMD, ARM, and RISC-V processors respectively across 3 different workload sizes. We observed a speedup of 1.1.. and 2.2.. relative to OpenMP on Intel and AMD respectively using Fork Join Executors. These findings suggest that HPXMP can offer a direct enhancement to existing OpenMP codebases across a wide range of architectures, without any modifications to the original source code, whereas HPX's forkjoin executors provide finer control of parallelism and increased performance. HPXMP also provides a viable first step on a migration path for OpenMP applications to HPX.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Integration of CUDA Processing within the C plus plus library for parallelism and concurrency (HPX)
    Diehl, Patrick
    Seshadri, Madhavan
    Heller, Thomas
    Kaiser, Hartmut
    [J]. PROCEEDINGS OF 2018 IEEE/ACM 4TH INTERNATIONAL WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2 2018), 2018, : 19 - 28
  • [2] Shared Memory Parallelism in Modern C plus plus and HPX
    Diehl, Patrick
    Brandt, Steven R.
    Kaiser, Hartmut
    [J]. ASYNCHRONOUS MANY-TASK SYSTEMS AND APPLICATIONS, WAMTA 2023, 2023, 13861 : 27 - 38
  • [3] An asynchronous and task-based implementation of peridynamics utilizing HPX—the C++ standard library for parallelism and concurrency
    Patrick Diehl
    Prashant K. Jha
    Hartmut Kaiser
    Robert Lipton
    Martin Lévesque
    [J]. SN Applied Sciences, 2020, 2
  • [4] An asynchronous and task-based implementation of peridynamics utilizing HPX-the C++ standard library for parallelism and concurrency
    Diehl, Patrick
    Jha, Prashant K.
    Kaiser, Hartmut
    Lipton, Robert
    Levesque, Martin
    [J]. SN APPLIED SCIENCES, 2020, 2 (12):
  • [5] DSParLib: A C plus plus Template Library for Distributed Stream Parallelism
    Loff, Junior
    Hoffmann, Renato B.
    Pieper, Ricardo
    Griebler, Dalvan
    Fernandes, Luiz G.
    [J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2022, 50 (5-6) : 454 - 485
  • [6] Mathematizing C plus plus Concurrency
    Batty, Mark
    Owens, Scott
    Sarkar, Susmit
    Sewell, Peter
    Weber, Tjark
    [J]. ACM SIGPLAN NOTICES, 2011, 46 (01) : 55 - 66
  • [7] Mathematizing C plus plus Concurrency
    Batty, Mark
    Owens, Scott
    Sarkar, Susmit
    Sewell, Peter
    Weber, Tjark
    [J]. POPL 11: PROCEEDINGS OF THE 38TH ANNUAL ACM SIGPLAN-SIGACT SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES, 2011, : 55 - 66
  • [8] Nitpicking C plus plus Concurrency
    Blanchette, Jasmin Christian
    Weber, Tjark
    Batty, Mark
    Owens, Scott
    Sarkar, Susmit
    [J]. PPDP 11 - PROCEEDINGS OF THE 2011 SYMPOSIUM ON PRINCIPLES AND PRACTICES OF DECLARATIVE PROGRAMMING, 2011, : 113 - 123
  • [9] Examining the Systems⟨ToolKit⟩ library -: Extending the C plus plus standard library
    Vilot, MJ
    [J]. DR DOBBS JOURNAL, 1996, 21 (11): : 80 - +
  • [10] Parallelism in C plus plus using Sequential Communicating Processes
    Paduraru, Ciprian
    Melemciuc, Marius-Constantin
    [J]. 2018 17TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2018, : 157 - 163