Software fault detection and recovery in critical real-time systems: An approach based on loose coupling

被引:6
|
作者
Alho, Pekka [1 ]
Mattila, Jouni [1 ]
机构
[1] Tampere Univ Technol, Dept Intelligent Hydraul & Automat, FIN-33101 Tampere, Finland
关键词
ITER; Remote handling; Software; Fault tolerance; Dependability; Real-time; TOLERANCE;
D O I
10.1016/j.fusengdes.2014.04.050
中图分类号
TL [原子能技术]; O571 [原子核物理学];
学科分类号
0827 ; 082701 ;
摘要
Remote handling (RH) systems are used to inspect, make changes to, and maintain components in the ITER machine and as such are an example of mission-critical system. Failure in a critical system may cause damage, significant financial losses and loss of experiment runtime, making dependability one of their most important properties. However, even if the software for RH control systems has been developed using best practices, the system might still fail due to undetected faults (bugs), hardware failures, etc. Critical systems therefore need capability to tolerate faults and resume operation after their occurrence. However, design of effective fault detection and recovery mechanisms poses a challenge due to timeliness requirements, growth in scale, and complex interactions. In this paper we evaluate effectiveness of service-oriented architectural approach to fault tolerance in mission-critical real-time systems. We use a prototype implementation for service management with an experimental RH control system and industrial manipulator. The fault tolerance is based on using the high level of decoupling between services to recover from transient faults by service restarts. In case the recovery process is not successful, the system can still be used if the fault was not in a critical software module. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:2272 / 2277
页数:6
相关论文
共 50 条
  • [21] Towards energy-aware software-based fault tolerance in real-time systems
    Unsal, OS
    Koren, I
    Krishna, CM
    ISLPED'02: PROCEEDINGS OF THE 2002 INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN, 2002, : 124 - 129
  • [22] A software fault detection and recovery in CDMA systems
    Lim, YS
    Kim, MH
    Cho, CH
    1997 IEEE INTERNATIONAL CONFERENCE ON PERSONAL WIRELESS COMMUNICATIONS, 1997, : 527 - 531
  • [23] Wireless Sensor Network Based Real-Time Monitoring and Fault Detection for Photovoltaic Systems
    Al-Kashashnehand, Hudefah Z.
    Al-Aubidy, Kasim M.
    2019 16TH INTERNATIONAL MULTI-CONFERENCE ON SYSTEMS, SIGNALS & DEVICES (SSD), 2019, : 315 - 321
  • [24] Knowledge-based real-time fault detection and supervision of urban drainage systems
    Szafnicki, K
    Graillot, D
    AUTOMATICA, 1996, 32 (07) : 1043 - 1047
  • [25] Schedulability analysis for fault-tolerant hard real-time systems based on rollback recovery
    Ding W.-F.
    Guo R.-F.
    Zhao J.
    Liu X.
    Li J.
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2011, 33 (07): : 1673 - 1679
  • [26] Real-time deadlock detection and recovery for automated manufacturing systems
    Yeh, WC
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2002, 20 (10): : 780 - 786
  • [27] Real-Time Deadlock Detection and Recovery for Automated Manufacturing Systems
    W.-C. Yeh
    The International Journal of Advanced Manufacturing Technology, 2002, 20 : 780 - 786
  • [28] Proportionate Fair based Multicore Scheduling for Fault Tolerant Multicore Real-Time Systems by Tight Coupling of Error Detection and Scheduling
    Kraemer, Stefan
    Mottok, Juergen
    Racek, Stanislav
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ELECTRICAL AND INFORMATION TECHNOLOGIES (ICEIT 2015), 2015, : 88 - 93
  • [29] Dynamic software reconfiguration for fault-tolerant real-time avionic systems
    Ellis, SM
    MICROPROCESSORS AND MICROSYSTEMS, 1997, 21 (01) : 29 - 39
  • [30] Can software implemented fault-injection be used on real-time systems?
    Cunha, JC
    Rela, MZ
    Silva, JG
    DEPENDABLE COMPUTING - EDCC-3, 1999, 1667 : 209 - 226