Availability in parallel systems: Automatic process restart

被引:3
|
作者
Bowen, NS [1 ]
Antognini, J [1 ]
Regan, RD [1 ]
Matsakis, NC [1 ]
机构
[1] IBM CORP,DIV S390,POUGHKEEPSIE,NY 12601
关键词
Availability - Computer architecture - Computer operating systems - Computer system recovery - Data processing - Information retrieval systems;
D O I
10.1147/sj.362.0284
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel and clustered architectures are increasingly being used as a foundation for high-capacity servers. At the same time, the availability expectations are also rising rapidly, since the effects of down time become more apparent and have higher economic consequences for larger systems. The use of parallel structures generally implies more hardware and software components. The presence of more and larger components increases the chances that an individual component will fail, and that failure has the potential to hurt the overall availability of the system. This paper discusses the use of ''restart techniques'' as an important strategy in providing increased availability in a parallel structure. The paper covers a set of functions that have been developed for the S/390(R) Parallel Sysplex(TM).
引用
收藏
页码:284 / 300
页数:17
相关论文
共 50 条
  • [41] PROCESS MANAGEMENT FOR HIGHLY PARALLEL UNIX SYSTEMS
    EDLER, J
    LIPKIS, J
    SCHONBERG, E
    PROCEEDINGS : WORKSHOP ON UNIX AND SUPERCOMPUTERS, 1988, : 1 - 18
  • [42] Automatic Protocol Conformance Checking of Recursive and Parallel BPEL Systems
    Both, Andreas
    Zimmermann, Wolf
    PROCEEDINGS OF THE SIXTH IEEE EUROPEAN CONFERENCE ON WEB SERVICES, 2008, : 81 - 91
  • [43] A PVM tool for automatic test generation on parallel and distributed systems
    Corno, F
    Prinetto, P
    Rebaudengo, M
    Reorda, MS
    Veiluva, E
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1995, 919 : 39 - 44
  • [44] On the efficiency of RESTART for multidimensional state systems
    Villen-Altamirano, Manuel
    Villen-Altamirano, Jose
    ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION, 2006, 16 (03): : 251 - 279
  • [45] Analysis of restart mechanisms in software systems
    van Moorsel, Aad P. A.
    Wolter, Katinka
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2006, 32 (08) : 547 - 558
  • [46] Problems of Information Security and Availability of Automated Process Control Systems
    Chernov, Denis
    Sychugov, Alexey
    2019 INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING, APPLICATIONS AND MANUFACTURING (ICIEAM), 2019,
  • [47] Line II restart process at Alumar - Brazil
    Borim, A
    Batista, E
    Bessa, E
    Matos, S
    LIGHT METALS 2005, 2005, : 337 - 340
  • [48] Constructing efficient strategies for the process optimization by restart
    Nikitin, Ilia
    Belan, Sergey
    PHYSICAL REVIEW E, 2024, 109 (05)
  • [49] ACR: Automatic Checkpoint/Restart for Soft and Hard Error Protection
    Ni, Xiang
    Meneses, Esteban
    Jain, Nikhil
    Kale, Laxmikant V.
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [50] The average availability of parallel checkpointing systems and its importance in selecting runtime parameters
    Plank, JS
    Thomason, MG
    TWENTY-NINTH ANNUAL INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, DIGEST OF PAPERS, 1999, : 250 - 257