Toward a Performance/Resilience Tool for Hardware/Software Co-Design of High-Performance Computing Systems

被引:15
|
作者
Engelmann, Christian [1 ]
Naughton, Thomas [1 ]
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
关键词
Fault Injection; Message Passing Interface; Parallel Discrete Event Simulation; High-performance Computing;
D O I
10.1109/ICPP.2013.114
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
xSim is a simulation-based performance investigation toolkit that permits running high-performance computing (HPC) applications in a controlled environment with millions of concurrent execution threads, while observing application performance in a simulated extreme-scale system for hardware/software co-design. The presented work details newly developed features for xSim that permit the injection of MPI process failures, the propagation/detection/notification of such failures within the simulation, and their handling using application-level checkpoint/restart. These new capabilities enable the observation of application behavior and performance under failure within a simulated future-generation HPC system using the most common fault handling technique.
引用
收藏
页码:960 / 969
页数:10
相关论文
共 50 条
  • [1] AGGREGATION OF PARALLEL COMPUTING AND HARDWARE/SOFTWARE CO-DESIGN TECHNIQUES FOR HIGH-PERFORMANCE REMOTE SENSING APPLICATIONS
    Castillo Atoche, A.
    Palma Marrufo, O.
    Ricalde Castellanos, L.
    [J]. 2011 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2011, : 217 - 220
  • [2] Viable Protection of High-Performance Networks through Hardware/Software Co-Design
    Amann, Johanna
    Sommer, Robin
    [J]. SDN-NFVSEC'17: PROCEEDINGS OF THE ACM INTERNATIONAL WORKSHOP ON SECURITY IN SOFTWARE DEFINED NETWORKS & NETWORK FUNCTION VIRTUALIZATION, 2017, : 19 - 24
  • [3] Co-design In High Performance Computing Systems
    Moreno, Jaime H.
    Wen, Sophia
    [J]. 2021 IEEE INTERNATIONAL ELECTRON DEVICES MEETING (IEDM), 2021,
  • [4] Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective
    Liao, Xiao-Fei
    Zhao, Wen-Ju
    Jin, Hai
    Yao, Peng-Cheng
    Huang, Yu
    Wang, Qing-Gang
    Zhao, Jin
    Zheng, Long
    Zhang, Yu
    Shao, Zhi-Yuan
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (02): : 245 - 266
  • [5] Automatic software hardware co-design for reconfigurable computing systems
    Saha, Proshanta
    [J]. 2007 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 2007, : 507 - 508
  • [6] Co-design for High Performance Computing
    Rodrigues, Arun
    Dosanjh, Sudip
    Hemmert, Scott
    [J]. NUMERICAL ANALYSIS AND APPLIED MATHEMATICS, VOLS I-III, 2010, 1281 : 1309 - 1312
  • [7] Hardware/software co-design for a high-performance Java']Java Card interpreter in low-end embedded systems
    Zilli, Massimiliano
    Raschke, Wolfgang
    Weiss, Reinhold
    Loinig, Johannes
    Steger, Christian
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2015, 39 (08) : 1076 - 1086
  • [8] Hardware-Software Co-Design for Network Performance Measurement
    Narayana, Srinivas
    Sivaraman, Anirudh
    Nathan, Vikram
    Alizadeh, Mohammad
    Walker, David
    Rexford, Jennifer
    Jeyakumar, Vimalkumar
    Kim, Changhoon
    [J]. PROCEEDINGS OF THE 15TH ACM WORKSHOP ON HOT TOPICS IN NETWORKS (HOTNETS '16), 2016, : 190 - 196
  • [9] A High-Performance ORB Accelerator with Algorithm and Hardware Co-design for Visual Localization
    Qi, Xiuyuan
    Liu, Ye
    Hao, Shuang
    Liu, Zherong
    Huang, Kun
    Yang, Minghui
    Zhou, Liang
    Zhou, Jun
    [J]. 2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [10] Tool lets you co-design hardware, software
    Moretti, G
    [J]. EDN, 2001, 46 (01) : 18 - 18