Performance Prediction for Large-Scale Parallel Applications Using Representative Replay

被引:12
|
作者
Zhai, Jidong [1 ]
Chen, Wenguang [1 ]
Zheng, Weimin [1 ]
Li, Keqin [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] SUNY Coll New Paltz, Dept Comp Sci, New Paltz, NY 12561 USA
关键词
Deterministic replay; high performance computing; MPI; parallel applications; performance prediction; trace-driven simulation; MODEL;
D O I
10.1109/TC.2015.2479630
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatically predicting performance of parallel applications has been a long-standing goal in the area of high performance computing. However, accurate performance prediction is challenging, since the execution time of parallel applications is determined by several factors, such as sequential computation time, communication time and their complex interactions. Despite previous efforts, accurately estimating the sequential computation time in each process for large-scale parallel applications remains an open problem. In this paper, we propose a novel approach to acquiring accurate sequential computation time using a parallel debugging technique called deterministic replay. The main advantage of our approach is that we only need a single node of a target platform but the whole target platform does not need to be available. Therefore, with this approach we can simply measure the real sequential computation time on a target node for each process on by one. Moreover, we observe that there is great computation similarity in parallel applications, not only within each process but also among different processes. Based on this observation, we further propose representative replay that can significantly reduce replay overhead, because we only need to replay partial iterations for representative processes instead of all of them. Finally, we implement a complete performance prediction system, called PHANTOM, which combines the above computation-time acquisition approach and a trace-driven simulator. We validate our approach on both traditional HPC platforms and the latest Amazon EC2 cloud platform. On both types of platforms, prediction error of our approach is less than 7 percent on average up to 2,500 processes.
引用
收藏
页码:2184 / 2198
页数:15
相关论文
共 50 条
  • [21] Performance Prediction for Large-scale Heterogeneous Platforms
    Yasudo, Ryota
    Varbanescu, Ana L.
    Coutinho, Jose G. F.
    Luk, Wayne
    Amano, Hideharu
    [J]. PROCEEDINGS 26TH IEEE ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2018), 2018, : 220 - 220
  • [22] Performance prediction using simulation of large-scale interconnection networks in POSE
    Wilmarth, TL
    Zheng, GB
    Bohm, EJ
    Mehta, Y
    Choudhury, N
    Jagadishprasad, P
    Kalé, LV
    [J]. WORKSHOP ON PRINCIPLES OF ADVANCED AND DISTRIBUTED SIMULATION, PROCEEDINGS, 2005, : 109 - 118
  • [23] Improving parallel performance of large-scale watershed simulations
    Eller, Paul R.
    Cheng, Jing-Ru C.
    Nguyen, Hung V.
    Maier, Robert S.
    [J]. ICCS 2010 - INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, PROCEEDINGS, 2010, 1 (01): : 801 - 808
  • [24] PERFORMANCE OF SRF SYSTEMS IN LARGE-SCALE APPLICATIONS
    HOVATER, JC
    [J]. PARTICLE ACCELERATORS, 1994, 46 (1-3): : 19 - 33
  • [25] Performance Analysis of Homogeneous On-Chip Large-Scale Parallel Computing Architectures for Data-Parallel Applications
    Chen, Xiaowen
    Lu, Zhonghai
    Jantsch, Axel
    Chen, Shuming
    Guo, Yang
    Chen, Shenggang
    Chen, Hu
    [J]. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2015, 2015
  • [26] Representative sampling in large-scale surveys
    Stephan, FF
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1939, 34 (206) : 343 - 352
  • [27] THE PERFORMANCE PREDICTION OF A PARALLEL SKYLINE SOLVER AND ITS IMPLEMENTATION FOR LARGE-SCALE STRUCTURE-ANALYSIS
    SYNN, SY
    FULTON, RE
    [J]. COMPUTING SYSTEMS IN ENGINEERING, 1995, 6 (03): : 275 - 284
  • [28] Characterizing Load and Communication Imbalance in Large-Scale Parallel Applications
    Boehme, David
    Wolf, Felix
    Geimer, Markus
    [J]. 2012 IEEE 26TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS & PHD FORUM (IPDPSW), 2012, : 2538 - 2541
  • [29] PLDA: Parallel Latent Dirichlet Allocation for Large-Scale Applications
    Wang, Yi
    Bai, Hongjie
    Stanton, Matt
    Chen, Wen-Yen
    Chang, Edward Y.
    [J]. ALGORITHMIC ASPECTS IN INFORMATION AND MANAGEMENT, PROCEEDINGS, 2009, 5564 : 301 - +
  • [30] Large-Scale Atmospheric Transport in GEOS Replay Simulations
    Orbe, Clara
    Oman, Luke D.
    Strahan, Susan E.
    Waugh, Darryn W.
    Pawson, Steven
    Takacs, Lawrence L.
    Molod, Andrea M.
    [J]. JOURNAL OF ADVANCES IN MODELING EARTH SYSTEMS, 2017, 9 (07) : 2545 - 2560