Fine-Grained MPI plus OpenMP Plasma Simulations: Communication Overlap with Dependent Tasks

被引:3
|
作者
Richard, Jerome [1 ,2 ]
Latu, Guillaume [1 ]
Bigot, Julien [3 ]
Gautier, Thierry [4 ]
机构
[1] CEA, IRFM, F-13108 St Paul Les Durance, France
[2] Zebrys, Toulouse, France
[3] Univ Paris Saclay, UVSQ, Univ Paris Sud, Maison Simulat,CEA,CNRS, Gif Sur Yvette, France
[4] Univ Lyon, INRIA, CNRS, ENS Lyon,Univ Claude Bernard Lyon 1,LIP, Lyon, France
来源
关键词
Dependent tasks; OpenMP; 4.5; MPI; Many-core;
D O I
10.1007/978-3-030-29400-7_30
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper demonstrates how OpenMP 4.5 tasks can be used to efficiently overlap computations and MPI communications based on a case-study conducted on multi-core and many-core architectures. It focuses on task granularity, dependencies and priorities, and also identifies some limitations of OpenMP. Results on 64 Skylake nodes show that while 64% of the wall-clock time is spent in MPI communications, 60% of the cores are busy in computations, which is a good result. Indeed, the chosen dataset is small enough to be a challenging case in terms of overlap and thus useful to assess worst-case scenarios in future simulations. Two key features were identified: by using task priority we improved the performance by 5.7% (mainly due to an improved overlap), and with recursive tasks we shortened the execution time by 9.7%. We also illustrate the need to have access to tools for task tracing and task visualization. These tools allowed a fine understanding and a performance increase for this task-based OpenMP+MPI code.
引用
收藏
页码:419 / 433
页数:15
相关论文
共 50 条
  • [1] Support for fine grained dependent tasks in OpenMP
    Sinnen, Oliver
    Pe, Jsun
    Kozlov, Alexei Vladimirovich
    PRACTICAL PROGRAMMING MODEL FOR THE MULTI-CORE ERA, PROCEEDINGS, 2008, 4935 : 13 - 24
  • [2] Efficient Communication/Computation Overlap with MPI plus OpenMP Runtimes Collaboration
    Sergent, Marc
    Dagrada, Mario
    Carribault, Patrick
    Jaeger, Julien
    Perache, Marc
    Papaure, Guillaume
    EURO-PAR 2018: PARALLEL PROCESSING, 2018, 11014 : 560 - 572
  • [3] Exploiting Fine-Grained Parallelism in the Phylogenetic Likelihood Function with MPI, Pthreads, and OpenMP: A Performance Study
    Stamatakis, Alexandros
    Ott, Michael
    PATTERN RECOGNITION IN BIOINFORMATICS, PROCEEDINGS, 2008, 5265 : 424 - +
  • [4] Real Asynchronous MPI Communication in Hybrid Codes through OpenMP Communication Tasks
    Buettner, David
    Acquaviva, Jean-Thomas
    Weidendorfer, Josef
    2013 19TH IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2013), 2013, : 208 - 215
  • [5] Fine-Grained Tasks for Crowdsourced Entity Resolution
    Nie, Tiezheng
    Mao, Hanyu
    Liu, Xin
    Yu, Sining
    APPLIED SCIENCES-BASEL, 2025, 15 (01):
  • [6] FINE-GRAINED MULTITHREADING SUPPORT FOR HYBRID THREADED MPI PROGRAMMING
    Balaji, Pavan
    Buntinas, Darius
    Goodell, David
    Gropp, William
    Thakur, Rajeev
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2010, 24 (01): : 49 - 57
  • [7] Fine-grained adaptive parallelism for automotive systems through AMALTHEA and OpenMP
    Munera, Adrian
    Royuela, Sara
    Pressler, Michael
    Mackamul, Harald
    Ziegenbein, Dirk
    Quinones, Eduardo
    JOURNAL OF SYSTEMS ARCHITECTURE, 2024, 146
  • [8] Fine-grained simulations of the microenvironment of vascularized tumours
    Fredrich, Thierry
    Rieger, Heiko
    Chignola, Roberto
    Milotti, Edoardo
    SCIENTIFIC REPORTS, 2019, 9 (1)
  • [9] Fine-grained simulations of the microenvironment of vascularized tumours
    Thierry Fredrich
    Heiko Rieger
    Roberto Chignola
    Edoardo Milotti
    Scientific Reports, 9
  • [10] Leveraging Multiple Tasks to Regularize Fine-Grained Classification
    Dasgupta, Riddhiman
    Namboodiri, Anoop M.
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3476 - 3481