A Fault Avoidance Strategy Improving the Reliability of the EGI Production Grid Infrastructure

被引:0
|
作者
Palmieri, Francesco [1 ]
Pardi, Silvio [2 ]
Veronesi, Paolo [3 ]
机构
[1] Univ Naples Federico II, Via Cinthia 5, I-80126 Naples, Italy
[2] INFN Istituto Nazionale Di Fisica Nucleare, INDAM, I-80126 Naples, Italy
[3] INFN CNAF, I-40127 Bologna, Italy
来源
关键词
Reliability; Fault Avoidance; Monitoring; Resource Management; COMPUTING SYSTEMS;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Reliability is a crucial issue for the development of stable and effective production grid infrastructures. That is, grid users must be able to trust upon the runtime service they request and receive from the underlying grid. Many runtime services and capabilities offered by modern Grid infrastructures are not available in advance to the application developers and dynamically bound only at the execution time, leading to an increased incidence of interaction faults. In this work we propose, implement and evaluate a novel low-impact fault-avoidance scheme, specifically conceived to improve the grid reliability from the user/application point of view, by providing proper service status information to the workload management system. In particular, starting from the EGEE experience, we designed a strategy inhibiting the use of some specific runtime capabilities on the available resources as soon as the monitoring system detect any anomalous behavior associated to these capabilities and re-integrating them when they restart to correctly work again. The results of a significant set of tests ran on the production EGEE infrastructure, have been presented to show the effectiveness of our approach.
引用
收藏
页码:159 / +
页数:3
相关论文
共 50 条
  • [41] An Emergency Control Strategy for DC grid with Station Fault Blocking
    Lin, Tingting
    Li, Wei
    Wang, Yu
    Liu, Fusuo
    Liu, Haitao
    Hou, Yuqiang
    Wang, Yanjun
    2017 IEEE CONFERENCE ON ENERGY INTERNET AND ENERGY SYSTEM INTEGRATION (EI2), 2017,
  • [42] Cascaded fault model of power grid with active defense strategy
    Liu, Gang
    Song, Yurong
    Li, Ruqi
    2022 34TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2022, : 2954 - 2960
  • [43] Fault Mechanism and Protection Strategy for DC Micro-grid
    Wan, Mingming
    Dong, Run
    Yang, Jingru
    Xu, Zixiao
    Zhang, Bowen
    He, Kun
    Li, Weilin
    2019 IEEE 28TH INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2019, : 2597 - 2602
  • [44] A dependable task scheduling strategy for a fault tolerant grid model
    Wang, YZ
    Lin, C
    Zhai, ZL
    Yang, Y
    ADVANCED WEB AND NETWORK TECHNOLOGIES, AND APPLICATIONS, PROCEEDINGS, 2006, 3842 : 534 - 539
  • [45] AN EXPERIMENTAL EVALUATION OF SOFTWARE REDUNDANCY AS A STRATEGY FOR IMPROVING RELIABILITY
    ECKHARDT, DE
    CAGLAYAN, AK
    KNIGHT, JC
    LEE, LD
    MCALLISTER, DF
    VOUK, MA
    KELLY, JPJ
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1991, 17 (07) : 692 - 702
  • [46] Improving the Reliability of Pumping Equipment at Production Plants
    Ovchinnikov N.P.
    Power Technology and Engineering, 2019, 52 (05) : 552 - 554
  • [47] Evaluation of Fault Tolerant Channel Buffers for Improving Reliability in NoCs
    DiTomaso, Dominic
    Boraten, Travis
    Kodi, Avinash
    Louri, Ahmed
    2012 IEEE 55TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2012, : 182 - 185
  • [48] Improving Reliability through Fault Propagation Scope in Embedded Systems
    Mathews, Oommen
    Koc, Hakduran
    Akcaman, Muberra N.
    2015 FIFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION PROCESSING AND COMMUNICATIONS (ICDIPC), 2015, : 300 - 305
  • [49] FAULT STATISTICS IN POWER PLANTS AS A MEANS OF IMPROVING TECHNICAL RELIABILITY
    NEUROTH, K
    BRENNSTOFF-WARME-KRAFT, 1970, 22 (01): : 7 - &
  • [50] Grid Service Reliability Modeling on Fault Recovery and Optimal Task Scheduling
    Guo, Suchang
    Yang, Bo
    Huang, Hong-Zhong
    ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 2009 PROCEEDINGS, 2009, : 472 - 477