Verifying Causality between Distant Performance Phenomena in Large-Scale MPI Applications

被引:2
|
作者
Hermanns, Marc-Andre [1 ]
Geimer, Markus [1 ]
Wolf, Felix [1 ]
Wylie, Brian J. N. [1 ]
机构
[1] Forschungszentrum Julich, Julich Supercomp Ctr, D-52425 Julich, Germany
关键词
PARALLEL;
D O I
10.1109/.49
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In message-passing applications, the temporal or spatial distance between cause and symptom of a performance problem constitutes a major difficulty in deriving helpful conclusions from performance data. Just knowing the locations of wait states in the program is often insufficient to understand the reason for their occurrence. We present a method for verifying hypotheses on causality between temporally or spatially distant performance phenomena in message-passing applications without altering the application itself. The verification is accomplished by modifying MPI event traces and using them to simulate the hypothetical message-passing behavior. By performing a parallel real-time reenactment of the communication to be simulated using the original execution configuration, we can achieve high scalability and good predictive accuracy in relation to the measured behavior. Not relying on a potentially complex model of the message-passing subsystem, our method is also platform independent.
引用
收藏
页码:78 / 84
页数:7
相关论文
共 50 条
  • [1] Performance modeling of hybrid MPI/OpenMP scientific applications on large-scale multicore supercomputers
    Wu, Xingfu
    Taylor, Valerie
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2013, 79 (08) : 1256 - 1268
  • [2] Performance analysis of large-scale OpenMP and hybrid MPI/OpenMP applications with Vampir NG
    Brunst, Holger
    Mohr, Bernd
    OPENMP SHARED MEMORY PARALLEL PROGRAMMING, PROCEEDINGS, 2008, 4315 : 5 - +
  • [3] Interoperability strategies for GASPI and MPI in large-scale scientific applications
    Simmendinger, Christian
    Iakymchuk, Roman
    Cebamanos, Luis
    Akhmetova, Dana
    Bartsch, Valeria
    Rotaru, Tiberiu
    Rahn, Mirko
    Laure, Erwin
    Markidis, Stefano
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2019, 33 (03): : 554 - 568
  • [4] Using MPI File Caching to Improve Parallel Write Performance for Large-Scale Scientific Applications
    Liao, Wei-keng
    Ching, Avery
    Coloma, Kenin
    Nisar, Arifa
    Choudhary, Alok
    Chen, Jacqueline
    Sankaran, Ramanan
    Klasky, Scott
    2007 ACM/IEEE SC07 CONFERENCE, 2010, : 661 - +
  • [5] Enhancing fault-tolerance of large-scale MPI scientific applications
    Rodriguez, G.
    Gonzalez, P.
    Martin, M. J.
    Tourino, J.
    PARALLEL COMPUTING TECHNOLOGIES, PROCEEDINGS, 2007, 4671 : 153 - 161
  • [6] Performance characteristics of hybrid MPI/OpenMP scientific applications on a large-scale multithreaded BlueGene/Q supercomputer
    Wu X.
    Taylor V.
    International Journal of Networked and Distributed Computing, 2013, 1 (4) : 213 - 225
  • [7] Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Large-scale Multithreaded BlueGene/Q Supercomputer
    Wu, Xingfu
    Taylor, Valerie
    2013 14TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD 2013), 2013, : 303 - 309
  • [8] Performance Characteristics of Hybrid MPI/OpenMP Scientific Applications on a Large-scale Multithreaded BlueGene/Q Supercomputer
    Wu, Xingfu
    Taylor, Valerie
    INTERNATIONAL JOURNAL OF NETWORKED AND DISTRIBUTED COMPUTING, 2013, 1 (04) : 213 - 225
  • [9] A Large-Scale Study of MPI Usage in Open-Source HPC Applications
    Laguna, Ignacio
    Marshall, Ryan
    Mohror, Kathryn
    Ruefenacht, Martin
    Skjellum, Anthony
    Sultana, Nawrin
    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [10] PERFORMANCE OF SRF SYSTEMS IN LARGE-SCALE APPLICATIONS
    HOVATER, JC
    PARTICLE ACCELERATORS, 1994, 46 (1-3): : 19 - 33