Performance-Aware Reliability Assessment of Heterogeneous Chips

被引:0
|
作者
Chatzidimitriou, Athanasios [1 ]
Kaliorakis, Manolis [1 ]
Tselonis, Sotiris [1 ]
Gizopoulos, Dimitris [1 ]
机构
[1] Univ Athens, Dept Informat & Telecommun, Athens, Greece
关键词
vulnerability evaluation; reliability; performance; fault injection; microarchitectural; simulators; CPU; GPU;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Technology evolution has raised serious reliability considerations, as transistor dimensions shrink and modern microprocessors become denser and more vulnerable to faults. Reliability studies have proposed a plethora of methodologies for assessing system vulnerability which, however, highly rely on traditional reliability metrics that solely express failure rate over time. Although Failures In Time (FIT) is a very strong and representative reliability metric, it may fail to offer an objective comparison of highly diverse systems, such as CPUs against GPUs or other accelerators that are often employed to execute the same algorithms implemented for these platforms. In this paper, we propose a reliability evaluation methodology that takes into account the probability of a workload execution failure in order to compare heterogeneous systems, while we also capture the differences in the performance of these systems. We demonstrate the usefulness of the methodology with a test case scenario that compares the reliability and performance of three different commercial CPUs (different ISAs and microarchitectures) and one GPU. We use statistical fault injection to assess the vulnerability of the register file for the four computing systems of our study. The evaluation was performed using a comprehensive set of benchmarks with the same algorithms implemented for each individual system (serial code for the CPUs and parallel code for the GPU). Our findings show that, even though the GPU proves to be three orders of magnitude more vulnerable than CPUs using traditional reliability metrics, our performance-aware evaluation methodology shrinks this gap by 1-2 orders of magnitude providing more informative and realistic measurements to guide designers or programmers decisions.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] An Automated Performance-Aware Approach to Reliability Transformations
    Lidman, Jacob
    McKee, Sally A.
    Quinlan, Daniel J.
    Liao, Chunhua
    [J]. EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT I, 2014, 8805 : 523 - 534
  • [2] Reliability/Performance-Aware Scheduling for Parallel Applications With Energy Constraints on Heterogeneous Computing Systems
    Peng, Jiwu
    Li, Kenli
    Chen, Jianguo
    Li, Keqin
    [J]. IEEE TRANSACTIONS ON SUSTAINABLE COMPUTING, 2022, 7 (03): : 681 - 695
  • [3] Reliability and Performance-aware 3D SRAM Design
    Pathak, Mohit
    Lim, Sung Kyu
    [J]. 2011 IEEE 54TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2011,
  • [4] Towards Reliability and Performance-Aware Wireless Network-on-Chip Design
    Agyeman, Michael Opoku
    Tong, Kin-Fai
    Mak, Terrence
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFTS), 2015, : 205 - 210
  • [5] Performance-Aware Multicore Programming
    Lo, Chia-Tien Dan
    [J]. PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), 2011, : 126 - 131
  • [6] Cost- and performance-aware resource selection for parallel software on heterogeneous cloud
    Bystrov, Oleg
    Pacevic, Ruslan
    Kaceniauskas, Arnas
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2024, 36 (10):
  • [7] Performance-Aware Orchestration of P4-Based Heterogeneous Cloud Environments
    Harkous, Hasanin
    Hosn, Bassel Aboul
    He, Mu
    Jarschel, Michael
    Pries, Rastin
    Kellerer, Wolfgang
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2023, 20 (04): : 4765 - 4778
  • [8] Performance-aware load balancing for multiclusters
    He, LG
    Jarvis, SA
    Bacigalupo, D
    Spooner, DP
    Nudd, GR
    [J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2004, 3358 : 635 - 647
  • [9] An Evolutive Approach for Designing Thermal and Performance-Aware Heterogeneous 3D-NoCs
    Sepulveda, Johanna
    Gogniat, Guy
    Pires, Ricardo
    Chau, Wang
    Strum, Marius
    [J]. 2013 26TH SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN (SBCCI 2013), 2013,
  • [10] Performance-aware workflow management for grid computing
    Spooner, DP
    Cao, J
    Jarvis, SA
    He, L
    Nudd, GR
    [J]. COMPUTER JOURNAL, 2005, 48 (03): : 347 - 357