Reliability-Aware Scalability Models for High Performance Computing

被引:0
|
作者
Zheng, Ziming [1 ]
Lan, Zhiling [1 ]
机构
[1] IIT, Dept Comp Sci, Chicago, IL 60616 USA
关键词
OPTIMUM CHECKPOINT INTERVAL; SYSTEMS; LAW;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scalability models are powerful analytical tools for evaluating and predicting the performance of parallel applications. Unfortunately, existing scalability models do not quantify failure impact and therefore cannot accurately account for application performance in the presence of failures. In this study, we extend two well-known models, namely Amdahl's law and Gustafson's law, by considering the impact of failures and the effect of fault tolerance techniques on applications. The derived reliability-aware models can be used to predict application scalability in failure-present environments and evaluate fault tolerance techniques. D-ace-based simulations via real failure logs demonstrate that the newly developed models provide a better understanding of application performance and scalability in the presence of failures.
引用
收藏
页码:172 / 180
页数:9
相关论文
共 50 条
  • [1] A Case for Lifetime Reliability-Aware Neuromorphic Computing
    Song, Shihao
    Das, Anup
    2020 IEEE 63RD INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2020, : 596 - 598
  • [2] Reliability-Aware Distributed Computing Scheduling Policy
    Abawajy, Jemal
    Hassan, Mohammad Mehedi
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2015, 2015, 9532 : 627 - 632
  • [3] Reliability-Aware Task Replication for Mobile Edge Computing
    Yang, Lipei
    Zhou, Ao
    Ma, Xiao
    Zhang, Yiran
    Wang, Shangguang
    IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (14): : 24846 - 24857
  • [4] Reliability-Aware Task Allocation Latency Optimization in Edge Computing
    Koulounipris, Andreas
    Michael, Maria K.
    Theocharides, Theocharis
    2019 IEEE 25TH INTERNATIONAL SYMPOSIUM ON ON-LINE TESTING AND ROBUST SYSTEM DESIGN (IOLTS 2019), 2019, : 200 - 203
  • [5] Reliability-Aware Joint Optimization for Cooperative Vehicular Communication and Computing
    Han, Xu
    Tian, Daxin
    Sheng, Zhengguo
    Duan, Xuting
    Zhou, Jianshan
    Hao, Wei
    Long, Kejun
    Chen, Min
    Leung, Victor C. M.
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (08) : 5437 - 5446
  • [6] Reliability-Aware Offloading and Allocation in Multilevel Edge Computing System
    Dong, Luobing
    Wu, Weili
    Guo, Qiumin
    Satpute, Meghana N.
    Znati, Taieb
    Du, Ding Zhu
    IEEE TRANSACTIONS ON RELIABILITY, 2021, 70 (01) : 200 - 211
  • [7] Reliability-aware scheduling strategy for heterogeneous distributed computing systems
    Tang, Xiaoyong
    Li, Kenli
    Li, Renfa
    Veeravalli, Bharadwaj
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2010, 70 (09) : 941 - 952
  • [8] Reliability-Aware Runahead
    Naithani, Ajeya
    Eeckhout, Lieven
    2022 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2022), 2022, : 786 - 799
  • [9] Reliability-aware DAG scheduling with primary-backup in cloud computing
    Jing, Weipeng
    Liu, Yaqiu
    Shao, Hongrun
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2015, 52 (01) : 86 - 93
  • [10] Reliability-aware Operation Chaining in High Level Synthesis
    Chen, Liang
    Ebrahimi, Mojtaba
    Tahoori, Mehdi B.
    2015 20TH IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2015,