Architectural support for system software on large-scale clusters

被引:0
|
作者
Fernández, J [1 ]
Frachtenberg, E [1 ]
Petrini, F [1 ]
Davis, K [1 ]
Sancho, JC [1 ]
机构
[1] Univ Murcia, Dept Ingn & Tecnol Comp, E-30071 Murcia, Spain
关键词
cluster computing; cluster operating system; network hardware; debuggability; resource management; fault tolerance;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Scalable management of distributed resources is one of the major challenges in deployment of large-scale clusters. Management includes transparent fault tolerance, efficient allocation of resources, and support for all the needs of parallel computing: parallel I/O, deterministic behavior and responsiveness. Meeting these requirements with commodity hardware and operating systems is difficult because they were not designed to support global management of a large-scale system. In this paper we propose a small set of hardware mechanisms in the cluster interconnect to facilitate the implementation of a simple yet powerful global operating system. This system, inspired by concepts from the BSP and SIMD computational models, allows commodity clusters to grow to thousands of nodes while still retaining the usability and responsiveness of the single-node workstation. Our results on a software prototype show that it is possible to implement efficient and scalable system software using the proposed set of mechanisms.
引用
收藏
页码:519 / 528
页数:10
相关论文
共 50 条
  • [1] An abstract interface for system software on large-scale clusters
    Fernandez, Juan
    Frachtenberg, Eitan
    Petrini, Fabrizio
    Sancho, Jose-Carlos
    [J]. COMPUTER JOURNAL, 2006, 49 (04): : 454 - 469
  • [2] A DESIGN AND MAINTENANCE SUPPORT SYSTEM FOR LARGE-SCALE SOFTWARE
    ODA, Y
    SATO, A
    OKUZAWA, O
    [J]. REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1984, 32 (01): : 31 - 40
  • [3] Architectural Complexity of Large-Scale Software Systems
    Lilienthal, Carola
    [J]. 13TH EUROPEAN CONFERENCE ON SOFTWARE MAINTENANCE AND REENGINEERING: CSMR 2009, PROCEEDINGS, 2009, : 17 - 26
  • [4] SOFTWARE AS A LARGE-SCALE SYSTEM
    SAGE, AP
    [J]. LARGE SCALE SYSTEMS IN INFORMATION AND DECISION TECHNOLOGIES, 1987, 12 (03): : 185 - 188
  • [5] Architectural Support for Efficient Large-Scale Automata Processing
    Liu, Hongyuan
    Ibrahim, Mohamed
    Kayiran, Onur
    Pai, Sreepathi
    Jog, Adwait
    [J]. 2018 51ST ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2018, : 908 - 920
  • [6] Architectural software support for processing clusters
    Gutleber, J
    Cano, E
    Cittolin, S
    Meijers, F
    Orsini, L
    Samyn, D
    [J]. CLUSTER 2000: IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2000, : 153 - 161
  • [7] Architectural integration styles for large-scale enterprise software systems
    Andersson, J
    Johnson, P
    [J]. FIFTH IEEE INTERNATIONAL ENTERPRISE DISTRIBUTED OBJECT COMPUTING CONFERENCE, PROCEEDINGS, 2001, : 224 - 236
  • [8] A software architecture to support a large-scale, multi-tier clinical information system
    Yungton, JA
    Sittig, DF
    Reilly, P
    Pappas, J
    Flammini, S
    Chueh, HC
    Teich, JM
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 1998, : 210 - 214
  • [9] ORGANIZATIONAL DECISION SUPPORT AS A LARGE-SCALE SYSTEM
    SAGE, AP
    [J]. LARGE SCALE SYSTEMS IN INFORMATION AND DECISION TECHNOLOGIES, 1987, 13 (01): : 1 - 3
  • [10] A large-scale study of architectural evolution in open-source software systems
    Pooyan Behnamghader
    Duc Minh Le
    Joshua Garcia
    Daniel Link
    Arman Shahbazian
    Nenad Medvidovic
    [J]. Empirical Software Engineering, 2017, 22 : 1146 - 1193