Logically Parallel Communication for Fast MPI plus Threads Applications

被引:2
|
作者
Zambre, Rohit [1 ]
Sahasrabudhe, Damodar [2 ]
Zhou, Hui [3 ]
Berzins, Martin [2 ]
Chandramowlishwaran, Aparna [1 ]
Balaji, Pavan [3 ]
机构
[1] Univ Calif Irvine, Irvine, CA 92697 USA
[2] Univ Utah, Sci Comput & Imaging Inst, Salt Lake City, UT 84112 USA
[3] Argonne Natl Lab, Math & Comp Sci Div, Argonne, IL 60439 USA
基金
美国国家科学基金会;
关键词
Parallel processing; Libraries; Programming; Standards; Semantics; Boilers; Upper bound; MPI plus threads; MPI plus OpenMP; exascale MPI; MPI_THREAD_MULTIPLE; MPI endpoints; Uintah; HYPRE; wombat; Legion; UINTAH FRAMEWORK; SUPPORT;
D O I
10.1109/TPDS.2021.3075157
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Supercomputing applications are increasingly adopting the MPI+threads programming model over the traditional "MPI everywhere" approach to better handle the disproportionate increase in the number of cores compared with other on-node resources. In practice, however, most applications observe a slower performance with MPI+threads primarily because of poor communication performance. Recent research efforts on MPI libraries address this bottleneck by mapping logically parallel communication, that is, operations that are not subject to MPI's ordering constraints to the underlying network parallelism. Domain scientists, however, typically do not expose such communication independence information because the existing MPI-3.1 standard's semantics can be limiting. Researchers had initially proposed user-visible endpoints to combat this issue, but such a solution requires intrusive changes to the standard (new APIs). The upcoming MPI-4.0 standard, on the other hand, allows applications to relax unneeded semantics and provides them with many opportunities to express logical communication parallelism. In this article, we show how MPI+threads applications can achieve high performance with logically parallel communication. Through application case studies, we compare the capabilities of the new MPI-4.0 standard with those of the existing one and user-visible endpoints (upper bound). Logical communication parallelism can boost the overall performance of an application by over 2x.
引用
收藏
页码:3038 / 3052
页数:15
相关论文
共 50 条
  • [1] Scalable Communication Endpoints for MPI plus Threads Applications
    Zambre, Rohit
    Chandramowlishwaran, Aparna
    Balaji, Pavan
    2018 IEEE 24TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2018), 2018, : 803 - 812
  • [2] Lessons Learned on MPI plus Threads Communication
    Zambre, Rohit
    Chandramowlishwaran, Aparna
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [3] Concurrent programming constructs for parallel MPI applications The MPI threads library
    Berka, Tobias
    Kollias, Giorgos
    Hagenauer, Helge
    Vajtersic, Marian
    Grama, Ananth
    JOURNAL OF SUPERCOMPUTING, 2013, 63 (02): : 385 - 406
  • [4] Characterizing MPI and Hybrid MPI plus Threads Applications at Scale: Case Study with BFS
    Amer, Abdelhalim
    Lu, Huiwei
    Balaji, Pavan
    Matsuoka, Satoshi
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 1075 - 1083
  • [5] MPI plus ULT: Overlapping Communication and Computation with User-Level Threads
    Lu, Huiwei
    Seo, Sangmin
    Balaji, Pavan
    2015 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2015 IEEE 7TH INTERNATIONAL SYMPOSIUM ON CYBERSPACE SAFETY AND SECURITY, AND 2015 IEEE 12TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS), 2015, : 444 - 454
  • [6] Frustrated with MPI plus Threads? Try MPIxThreads!
    Zhou, Hui
    Raffenetti, Ken
    Zhang, Junchao
    Guo, Yanfei
    Thakur, Rajeev
    PROCEEDINGS OF THE 2023 30TH EUROPEAN MPI USERS' GROUP MEETING, EUROMPL 2023, 2023,
  • [7] MPI plus Threads: Runtime Contention and Remedies
    Amer, Abdelhalim
    Lu, Huiwei
    Wei, Yanjie
    Balaji, Pavan
    Matsuoka, Satoshi
    ACM SIGPLAN NOTICES, 2015, 50 (08) : 239 - 248
  • [8] Concurrent programming constructs for parallel MPI applicationsThe MPI threads library
    Tobias Berka
    Giorgos Kollias
    Helge Hagenauer
    Marian Vajteršic
    Ananth Grama
    The Journal of Supercomputing, 2013, 63 : 385 - 406
  • [9] Diagnostic methods for communication waiting in MPI parallel programs and applications
    Wu L.
    Jing C.
    Liu X.
    Tian H.
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2020, 42 (02): : 47 - 54
  • [10] A study of threads and MPI libraries for implementing parallel simulation
    Huttunen, P
    Porras, J
    Ikonen, J
    SIMULATION IN INDUSTRY'2000, 2000, : 96 - 102