An Adaptive Approach for Online Fault Management in Many-Core Architectures

被引:0
|
作者
Bolchini, Cristiana [1 ]
Miele, Antonio [1 ]
Sciuto, Donatella [1 ]
机构
[1] Politecn Milan, Dip Elettron & Informaz, Pzza L da Vinci 32, I-20133 Milan, Italy
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a dynamic scheduling solution to achieve fault tolerance in many-core architectures. Triple Modular Redundancy is applied on the multi-threaded application to dynamically mitigate the effects of both permanent and transient faults, and to identify and isolate damaged units. The approach targets the best performance, while balancing the use of the healthy resources to limit wear-out and aging effects, which cause permanent damages. Experimental results on synthetic case studies are reported, to validate the ability to tolerate faults while optimizing performance and resource usage.
引用
收藏
页码:1429 / 1432
页数:4
相关论文
共 50 条
  • [1] A Power Modelling Approach for Many-core Architectures
    Lai, Zhiquan
    Lam, King Tin
    Wang, Cho-Li
    Su, Jinshu
    [J]. 2014 10TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2014, : 128 - 132
  • [2] A Power-Aware Approach for Online Test Scheduling in Many-Core Architectures
    Haghbayan, Mohammad-Hashem
    Rahmani, Amir-Mohammad
    Miele, Antonio
    Fattah, Mohammad
    Plosila, Juha
    Liljeberg, Pasi
    Tenhunen, Hannu
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (03) : 730 - 743
  • [3] Adaptive Power Profiling for Many-Core HPC Architectures
    Kelley, Jaimie
    Stewart, Christopher
    Tiwari, Devesh
    Gupta, Saurabh
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING (ICAC), 2016, : 179 - 188
  • [4] Distributed Peak Power Management for Many-core Architectures
    Sartori, John
    Kumar, Rakesh
    [J]. DATE: 2009 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, VOLS 1-3, 2009, : 1556 - 1559
  • [5] Accelerated Online Error Detection in Many-core Microprocessor Architectures
    Kaliorakis, Manolis
    Psarakis, Mihalis
    Foutris, Nikos
    Gizopoulos, Dimitris
    [J]. 2014 IEEE 32ND VLSI TEST SYMPOSIUM (VTS), 2014,
  • [6] Adaptive Fault Simulation on Many-core Microprocessor Systems
    Haghbayan, Mohammad-Hashem
    Teravainen, Sami
    Rahmani, Amir-Mohammad
    Liljeberg, Pasi
    Tenhunen, Hannu
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE IN VLSI AND NANOTECHNOLOGY SYSTEMS (DFTS), 2015, : 151 - 154
  • [7] Software-Based Hardware Fault Tolerance for Many-Core Architectures
    Wunderlich, Hans-Joachim
    [J]. IEEE INTERNATIONAL SYMPOSIUM ON DEFECT AND FAULT TOLERANCE VLSI SYSTEMS, PROCEEDINGS, 2009, : 223 - 223
  • [8] Algorithm-based Fault-Tolerance on Many-Core Architectures
    Braun, Claus
    Wunderlich, Hans-Joachim
    [J]. IT-INFORMATION TECHNOLOGY, 2010, 52 (04): : 209 - 215
  • [9] Resource Management of Many-core Architectures in Different Abstraction Levels
    Piller, Imre
    Fegyverneki, Sandor
    [J]. 2015 16TH INTERNATIONAL CARPATHIAN CONTROL CONFERENCE (ICCC), 2015, : 389 - 392
  • [10] Automatic management of Software Programmable Memories in Many-core Architectures
    Shrivastava, Aviral
    Dutt, Nikil
    Cai, Jian
    Shoushtari, Majid
    Donyanavard, Bryan
    Tajik, Hossein
    [J]. IET COMPUTERS AND DIGITAL TECHNIQUES, 2016, 10 (06): : 288 - 298