Runtime Reliability Monitoring for Complex Fault-Tolerance Policies

被引:0
|
作者
Fantechi, Alessandro [1 ]
Gori, Gloria [1 ]
Papini, Marco [1 ]
机构
[1] Univ Florence, DINFO, Florence, Italy
关键词
software reliability; reliability modeling; fault-tolerant systems; cyber-physical systems; reliability based monitoring; prognostics; REJUVENATION;
D O I
10.1109/ICSRS56243.2022.10067561
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Reliability of complex Cyber-Physical Systems is necessary to guarantee availability and/or safety of the provided services. Diverse and complex fault tolerance policies are adopted to enhance reliability, that include a varied mix of redundancy and dynamic reconfiguration to address hardware reliability, as well as specific software reliability techniques like diversity or software rejuvenation. These complex policies call for flexible runtime health checks of system executions that go beyond conventional runtime monitoring of pre-programmed health conditions, also in order to minimize maintenance costs. Defining a suitable monitoring model, according to these principles, for complex systems is still a challenge. In this paper we propose a novel approach, Reliability Based Monitoring (RBM), for a flexible runtime monitoring of reliability in complex systems, that exploits a hierarchical reliability model periodically applied to runtime diagnostics data: this allows to dynamically plan maintenance activities aimed at preventing failures. As a proof of concept, we show how to apply RBM to a 2oo3 software system implementing different fault-tolerant policies.
引用
收藏
页码:110 / 119
页数:10
相关论文
共 50 条
  • [1] Reliability and Fault-Tolerance by Choreographic Design
    Cassar, Ian
    Francalanza, Adrian
    Mezzina, Claudio Antares
    Tuosto, Emilio
    [J]. ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2017, (254): : 69 - 80
  • [2] Fault-tolerance and reliability in networked sensor systems
    Li, HL
    Xing, LD
    [J]. Proceedings of the 4th International Conference on Quality & Reliability, 2005, : 399 - 408
  • [3] RELIABILITY AND FAULT-TOLERANCE IN MULTISTAGE INTERCONNECTION NETWORKS
    RAGHAVENDRA, CS
    VARMA, A
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1987, 11 : 111 - 128
  • [4] Sensor self-monitoring and fault-tolerance
    Werthschuetzky, Roland
    Mueller, Ralf
    [J]. TM-TECHNISCHES MESSEN, 2007, 74 (04) : 176 - 184
  • [5] Private reliability environments for efficient fault-tolerance in CGRAs
    Jafri, Syed M. A. H.
    Piestrak, Stanislaw J.
    Hemani, Ahmed
    Paul, Kolin
    Plosila, Juha
    Tenhunen, Hannu
    [J]. DESIGN AUTOMATION FOR EMBEDDED SYSTEMS, 2014, 18 (3-4) : 295 - 327
  • [6] FAULT-TOLERANCE
    GROSSPIETSCH, KE
    [J]. MICROPROCESSING AND MICROPROGRAMMING, 1993, 38 (1-5): : 783 - 783
  • [7] A new approach for mobile agent fault-tolerance and reliability
    Mohammadi, K.
    Hamidi, H.
    [J]. 2005 1ST IEEE/IFIP INTERNATIONAL CONFERENCE IN CENTRAL ASIA ON INTERNET (ICI), 2005, : 164 - 168
  • [8] Review of Multistage Interconnection Networks Reliability and Fault-Tolerance
    Rajkumar, S.
    Goyal, Neeraj Kumar
    [J]. IETE TECHNICAL REVIEW, 2016, 33 (03) : 223 - 230
  • [9] Assessing the reliability impacts of software fault-tolerance mechanisms
    Mendiratta, VB
    [J]. SEVENTH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 1996, : 99 - 103
  • [10] Private reliability environments for efficient fault-tolerance in CGRAs
    Syed M. A. H. Jafri
    Stanislaw J. Piestrak
    Ahmed Hemani
    Kolin Paul
    Juha Plosila
    Hannu Tenhunen
    [J]. Design Automation for Embedded Systems, 2014, 18 : 295 - 327