Performance-Aware Reliability Assessment of Heterogeneous Chips

被引:0
|
作者
Chatzidimitriou, Athanasios [1 ]
Kaliorakis, Manolis [1 ]
Tselonis, Sotiris [1 ]
Gizopoulos, Dimitris [1 ]
机构
[1] Univ Athens, Dept Informat & Telecommun, Athens, Greece
关键词
vulnerability evaluation; reliability; performance; fault injection; microarchitectural; simulators; CPU; GPU;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Technology evolution has raised serious reliability considerations, as transistor dimensions shrink and modern microprocessors become denser and more vulnerable to faults. Reliability studies have proposed a plethora of methodologies for assessing system vulnerability which, however, highly rely on traditional reliability metrics that solely express failure rate over time. Although Failures In Time (FIT) is a very strong and representative reliability metric, it may fail to offer an objective comparison of highly diverse systems, such as CPUs against GPUs or other accelerators that are often employed to execute the same algorithms implemented for these platforms. In this paper, we propose a reliability evaluation methodology that takes into account the probability of a workload execution failure in order to compare heterogeneous systems, while we also capture the differences in the performance of these systems. We demonstrate the usefulness of the methodology with a test case scenario that compares the reliability and performance of three different commercial CPUs (different ISAs and microarchitectures) and one GPU. We use statistical fault injection to assess the vulnerability of the register file for the four computing systems of our study. The evaluation was performed using a comprehensive set of benchmarks with the same algorithms implemented for each individual system (serial code for the CPUs and parallel code for the GPU). Our findings show that, even though the GPU proves to be three orders of magnitude more vulnerable than CPUs using traditional reliability metrics, our performance-aware evaluation methodology shrinks this gap by 1-2 orders of magnitude providing more informative and realistic measurements to guide designers or programmers decisions.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Towards A Performance-Aware Model of Manufacturing Information Sharing
    Zhou, Zude
    Hu, Peng
    Liu, Quan
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY, VOLS 1-5, 2008, : 869 - 872
  • [22] Floorplet: Performance-Aware Floorplan Framework for Chiplet Integration
    Chen, Shixin
    Li, Shanyi
    Zhuang, Zhen
    Zheng, Su
    Liang, Zheng
    Ho, Tsung-Yi
    Yu, Bei
    Sangiovanni-Vincentelli, Alberto L.
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (06) : 1638 - 1649
  • [23] Performance-Aware Device Driver Architecture for Signal Processing
    Sydow, Stefan
    Nabelsee, Mohannad
    Busse, Anselm
    Koch, Sebastian
    Parzyjegla, Helge
    [J]. PROCEEDINGS OF 28TH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING, (SBAC-PAD 2016), 2016, : 67 - 75
  • [24] Performance-Aware Energy Saving for Data Center Networks
    Al-Tarazi, Motassem
    Chang, J. Morris
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2019, 16 (01): : 206 - 219
  • [25] Pearl: Performance-Aware Wear Leveling for Nonvolatile FPGAs
    Zhang, Hao
    Liu, Ke
    Zhao, Mengying
    Shen, Zhaoyan
    Cai, Xiaojun
    Jia, Zhiping
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2021, 40 (02) : 274 - 286
  • [26] Precision and Performance-Aware Voltage Scaling in DNN Accelerators
    Rathore, Mallika
    Milder, Peter
    Salman, Emre
    [J]. PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 237 - 242
  • [27] FCloudless: A Performance-Aware Collaborative Mechanism for JointCloud Serverless
    Liu, Jianfei
    Wang, Huaimin
    Shi, Peichang
    Li, Yaojie
    Ma, Penghui
    Yi, Guodong
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING, JCC, 2023, : 93 - 94
  • [28] Application transformations for energy and performance-aware device management
    Heath, T
    Pinheiro, E
    Hom, J
    Kremer, U
    Bianchini, R
    [J]. 2002 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, PROCEEDINGS, 2002, : 121 - 130
  • [29] Performance-Aware NILM Model Optimization for Edge Deployment
    Sykiotis, Stavros
    Athanasoulias, Sotirios
    Kaselimi, Maria
    Doulamis, Anastasios
    Doulamis, Nikolaos
    Stankovic, Lina
    Stankovic, Vladimir
    [J]. IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2023, 7 (03): : 1434 - 1446
  • [30] A Framework for Performance-Aware Composition of Explicitly Parallel Components
    Kessler, Christoph W.
    Lowe, Welf
    [J]. PARALLEL COMPUTING: ARCHITECTURES, ALGORITHMS AND APPLICATIONS, 2008, 15 : 227 - +