A Survey of Fault-Tolerance Techniques for Embedded Systems From the Perspective of Power, Energy, and Thermal Issues

被引:25
|
作者
Safari, Sepideh [1 ,2 ]
Ansari, Mohsen [1 ,2 ]
Khdr, Heba [2 ]
Gohari-Nazari, Pourya [1 ]
Yari-Karin, Sina [1 ]
Yeganeh-Khaksar, Amir [1 ]
Hessabi, Shaahin [1 ]
Ejlali, Alireza [1 ]
Henkel, Jorg [2 ]
机构
[1] Sharif Univ Technol, Dept Comp Sci & Engn, Tehran 1458889694, Iran
[2] Karlsruhe Inst Technol KIT, Dept Comp Sci, D-76131 Karlsruhe, Germany
关键词
Fault tolerant systems; Task analysis; Reliability; Real-time systems; Embedded systems; Redundancy; Computational modeling; Fault-tolerance; embedded systems; real-time computing; scheduling; power; energy minimization; thermal-aware design; CO-DESIGN APPROACH; ERROR-DETECTION; TIME; EFFICIENT; MANAGEMENT; RELIABILITY; MITIGATION; SOFTWARE; TASKS;
D O I
10.1109/ACCESS.2022.3144217
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The relentless technology scaling has provided a significant increase in processor performance, but on the other hand, it has led to adverse impacts on system reliability. In particular, technology scaling increases the processor susceptibility to radiation-induced transient faults. Moreover, technology scaling with the discontinuation of Dennard scaling increases the power densities, thereby temperatures, on the chip. High temperature, in turn, accelerates transistor aging mechanisms, which may ultimately lead to permanent faults on the chip. To assure a reliable system operation, despite these potential reliability concerns, fault-tolerance techniques have emerged. Specifically, fault-tolerance techniques employ some kind of redundancies to satisfy specific reliability requirements. However, the integration of fault-tolerance techniques into real-time embedded systems complicates preserving timing constraints. As a remedy, many task mapping/scheduling policies have been proposed to consider the integration of fault-tolerance techniques and enforce both timing and reliability guarantees for real-time embedded systems. More advanced techniques aim additionally at minimizing power and energy while at the same time satisfying timing and reliability constraints. Recently, some scheduling techniques have started to tackle a new challenge, which is the temperature increase induced by employing fault-tolerance techniques. These emerging techniques aim at satisfying temperature constraints besides timing and reliability constraints. This paper provides an in-depth survey of the emerging research efforts that exploit fault-tolerance techniques while considering timing, power/energy, and temperature from the real-time embedded systems' design perspective. In particular, the task mapping/scheduling policies for fault-tolerance real-time embedded systems are reviewed and classified according to their considered goals and constraints. Moreover, the employed fault-tolerance techniques, application models, and hardware models are considered as additional dimensions of the presented classification. Lastly, this survey gives deep insights into the main achievements and shortcomings of the existing approaches and highlights the most promising ones.
引用
收藏
页码:12229 / 12251
页数:23
相关论文
共 50 条
  • [1] Power saving and fault-tolerance in real-time critical embedded systems
    Santos, Rodrigo M.
    Santos, Jorge
    Orozco, Javier D.
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2009, 55 (02) : 90 - 101
  • [2] Trading Off Power and Fault-tolerance in Real-time Embedded Systems
    Panerati, Jacopo
    Beltrame, Giovanni
    [J]. 2015 NASA/ESA CONFERENCE ON ADAPTIVE HARDWARE AND SYSTEMS (AHS), 2015,
  • [3] PASSIVE FAULT-TOLERANCE MANAGEMENT IN COMPONENT-BASED EMBEDDED SYSTEMS
    Nogueira, Luis
    Coelho, Jorge
    [J]. COMPUTING AND INFORMATICS, 2015, 34 (01) : 23 - 44
  • [4] On the Design of Fault-Tolerance in a Decentralized Software Platform for Power Systems
    Ghosh, Purboday
    Eisele, Scott
    Dubey, Abhishek
    Metelko, Mary
    Madari, Istvan
    Volgyesi, Peter
    Karsai, Gabor
    [J]. 2019 IEEE 22ND INTERNATIONAL SYMPOSIUM ON REAL-TIME DISTRIBUTED COMPUTING (ISORC 2019), 2019, : 52 - 60
  • [5] Tuning Software-based Fault-tolerance Techniques for Power Optimization
    Chielle, Eduardo
    Kastensmidt, Fernando Lima
    Cuenca-Asensi, Sergio
    [J]. 2014 24TH INTERNATIONAL WORKSHOP ON POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION (PATMOS), 2014,
  • [6] Survey of Fault-Tolerance Techniques for Three-Phase Voltage Source Inverters
    Mirafzal, Behrooz
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2014, 61 (10) : 5192 - 5202
  • [7] An Overview of Fault-Diagnosis and Fault-Tolerance Techniques for Switched Reluctance Machine Systems
    Gan, Chun
    Chen, Yu
    Qu, Ronghai
    Yu, Zhiyue
    Kong, Wubin
    Hu, Yihua
    [J]. IEEE ACCESS, 2019, 7 : 174822 - 174838
  • [8] Fault-Tolerance and Reliability of Post-CMOS Systems: a Circuit Perspective
    Stanisavljevic, Milos
    Schmid, Alexandre
    Leblebici, Yusuf
    [J]. 2009 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ISPACS 2009), 2009, : 433 - 436
  • [9] Towards middleware for fault-tolerance in distributed real-time and embedded systems
    Balasubramanian, Jaiganesh
    Gokhale, Aniruddha
    Schmidt, Douglas C.
    Wang, Nanbor
    [J]. DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, 2008, 5053 : 72 - +
  • [10] Peak-Power-Aware Primary-Backup Technique for Efficient Fault-Tolerance in Multicore Embedded Systems
    Ansari, Mohsen
    Salehi, Mohammad
    Safari, Sepideh
    Ejlali, Alireza
    Shafique, Muhammad
    [J]. IEEE ACCESS, 2020, 8 (08): : 142843 - 142857