Availability in parallel systems: Automatic process restart

被引:3
|
作者
Bowen, NS [1 ]
Antognini, J [1 ]
Regan, RD [1 ]
Matsakis, NC [1 ]
机构
[1] IBM CORP,DIV S390,POUGHKEEPSIE,NY 12601
关键词
Availability - Computer architecture - Computer operating systems - Computer system recovery - Data processing - Information retrieval systems;
D O I
10.1147/sj.362.0284
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Parallel and clustered architectures are increasingly being used as a foundation for high-capacity servers. At the same time, the availability expectations are also rising rapidly, since the effects of down time become more apparent and have higher economic consequences for larger systems. The use of parallel structures generally implies more hardware and software components. The presence of more and larger components increases the chances that an individual component will fail, and that failure has the potential to hurt the overall availability of the system. This paper discusses the use of ''restart techniques'' as an important strategy in providing increased availability in a parallel structure. The paper covers a set of functions that have been developed for the S/390(R) Parallel Sysplex(TM).
引用
收藏
页码:284 / 300
页数:17
相关论文
共 50 条
  • [1] Automatic restart for communication based train control systems
    Mirtchev, A
    Proceedings of the 2005 ASME/IEEE Joint Rail Conference: RESEARCH AND TESTING FOR INDUSTRY ADVANCEMENT, 2005, 29 : 177 - 179
  • [2] Self-management of systems through automatic restart
    Wolter, K
    SELF-STAR PROPERTIES IN COMPLEX INFORMATION SYSTEMS: CONCEPTUAL AND PRACTICAL FOUNDATIONS, 2005, 3460 : 189 - 203
  • [3] Availability of commercial parallel systems
    Chung, JY
    Bowen, NS
    Hsueh, MC
    Iyer, RK
    Kishimoto, M
    Laranjeira, LA
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL SYMPOSIUM ON FAULT-TOLERANT COMPUTING, 1996, : 2 - 4
  • [4] AVAILABILITY ANALYSIS FOR SERIES - PARALLEL SYSTEMS
    BARTON, HR
    PROCEEDINGS ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM, 1989, (SYM): : 516 - 521
  • [5] METAPHOR COMPREHENSION IS AN AUTOMATIC AND PARALLEL PROCESS
    GLUCKSBERG, S
    HARTMAN, DE
    STACK, RA
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1977, 10 (04) : 270 - 270
  • [7] Handling Persistent States in Process Checkpoint/Restart Mechanisms for HPC Systems
    Riteau, Pierre
    Lebre, Adrien
    Morin, Christine
    CCGRID: 2009 9TH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, 2009, : 404 - 411
  • [8] Reliability, maintainability and availability of automatic people mover systems
    Howell, John K., 1600, Publ by Inst of Transportation, Washington, DC, United States (23): : 2 - 3
  • [9] Fluctuation Analysis of Instantaneous Availability for the Parallel Repairable Systems
    Yang, Yi
    Li, Qianbin
    Chen, Xuefeng
    Kang, Rui
    IEEE ACCESS, 2019, 7 : 53358 - 53364
  • [10] ANALYSIS OF POSTERIOR AVAILABILITY DISTRIBUTIONS OF SERIES AND PARALLEL SYSTEMS
    SHARMA, KK
    BHUTANI, RK
    MICROELECTRONICS AND RELIABILITY, 1994, 34 (02): : 379 - 381