Accelerating LULESH using HPX - the C plus plus Standard Library for Parallelism and Concurrency

被引：0

作者：

Singanaboina, Srinivas Yadav ^{[1
]}

Wei, Weile ^{[2
]}

Seiras, Isidoros Tsaousis ^{[3
]}

Syskakis, Panagiotis ^{[3
]}

Richardson, Bradley ^{[4
]}

Cook, Brandon ^{[4
]}

Kaiser, Hartmut ^{[1
]}

机构：

[1] Louisiana State Univ, Baton Rouge, LA 70803 USA

[2] STE ARGrp, Sunnyvale, CA USA

[3] Aristotle Univ Thessaloniki, Thessaloniki, Greece

[4] Natl Energy Res Sci Comp Ctr, Berkeley, CA USA

来源：

PRACTICE AND EXPERIENCE IN ADVANCED RESEARCH COMPUTING 2024, PEARC 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

C plus; HPX; LULESH; OpenMP;

D O I：

10.1145/3626203.3670529

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The rapid increase in computer hardware capabilities demands the need for greater parallel efficiency. While the existing CPU parallelization techniques using OpenMP provide adequate performance on parallelizing intensive scientific compute workloads, they lack user flexibility in controlling parallelism, which can enhance performance. In this paper we describe a new port of Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics(LULESH), a widely studied benchmark in the Department of Energy co-design efforts for ExaScale computing, to HPX and HPXMP (a new OpenMP backend of HPX), and an evaluation of the performance on Intel, AMD, ARM, and RISC-V architectures. We explain how we progressively incorporated HPX's parallel features to boost the performance of an HPX-enabled LULESH, including using parallel execution policy, hpx::for_loop, and HPX's fork-join executor. We compare the shared memory parallelism performance of five LULESH versions: NVC++ implementation using C++ algorithms, OpenMP pragma-based implementation, HPX implementation using C++ algorithms, HPX using fork-join executors, and HPXMP. Using HPXMP we observed an average speedup of 1.4.., 1.5.., 1.2.., and 1.3.. relative to OpenMP on Intel, AMD, ARM, and RISC-V processors respectively across 3 different workload sizes. We observed a speedup of 1.1.. and 2.2.. relative to OpenMP on Intel and AMD respectively using Fork Join Executors. These findings suggest that HPXMP can offer a direct enhancement to existing OpenMP codebases across a wide range of architectures, without any modifications to the original source code, whereas HPX's forkjoin executors provide finer control of parallelism and increased performance. HPXMP also provides a viable first step on a migration path for OpenMP applications to HPX.

引用

页数：8

共 50 条

[1] Integration of CUDA Processing within the C plus plus library for parallelism and concurrency (HPX)
Diehl, Patrick
Seshadri, Madhavan
Heller, Thomas
Kaiser, Hartmut
[J]. PROCEEDINGS OF 2018 IEEE/ACM 4TH INTERNATIONAL WORKSHOP ON EXTREME SCALE PROGRAMMING MODELS AND MIDDLEWARE (ESPM2 2018), 2018, : 19 - 28
[2] Shared Memory Parallelism in Modern C plus plus and HPX
Diehl, Patrick
Brandt, Steven R.
Kaiser, Hartmut
[J]. ASYNCHRONOUS MANY-TASK SYSTEMS AND APPLICATIONS, WAMTA 2023, 2023, 13861 : 27 - 38
[3] An asynchronous and task-based implementation of peridynamics utilizing HPX—the C++ standard library for parallelism and concurrency
Patrick Diehl
Prashant K. Jha
Hartmut Kaiser
Robert Lipton
Martin Lévesque
[J]. SN Applied Sciences, 2020, 2
[4] An asynchronous and task-based implementation of peridynamics utilizing HPX-the C++ standard library for parallelism and concurrency
Diehl, Patrick
Jha, Prashant K.
Kaiser, Hartmut
Lipton, Robert
Levesque, Martin
[J]. SN APPLIED SCIENCES, 2020, 2 (12):
[5] DSParLib: A C plus plus Template Library for Distributed Stream Parallelism
Loff, Junior
Hoffmann, Renato B.
Pieper, Ricardo
Griebler, Dalvan
Fernandes, Luiz G.
[J]. INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2022, 50 (5-6) : 454 - 485
[6] Mathematizing C plus plus Concurrency
Batty, Mark
Owens, Scott
Sarkar, Susmit
Sewell, Peter
Weber, Tjark
[J]. ACM SIGPLAN NOTICES, 2011, 46 (01) : 55 - 66
[7] Mathematizing C plus plus Concurrency
Batty, Mark
Owens, Scott
Sarkar, Susmit
Sewell, Peter
Weber, Tjark
[J]. POPL 11: PROCEEDINGS OF THE 38TH ANNUAL ACM SIGPLAN-SIGACT SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES, 2011, : 55 - 66
[8] Nitpicking C plus plus Concurrency
Blanchette, Jasmin Christian
Weber, Tjark
Batty, Mark
Owens, Scott
Sarkar, Susmit
[J]. PPDP 11 - PROCEEDINGS OF THE 2011 SYMPOSIUM ON PRINCIPLES AND PRACTICES OF DECLARATIVE PROGRAMMING, 2011, : 113 - 123
[9] Examining the Systems⟨ToolKit⟩ library -: Extending the C plus plus standard library
Vilot, MJ
[J]. DR DOBBS JOURNAL, 1996, 21 (11): : 80 - +
[10] Parallelism in C plus plus using Sequential Communicating Processes
Paduraru, Ciprian
Melemciuc, Marius-Constantin
[J]. 2018 17TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2018, : 157 - 163

← 1 2 3 4 5 →