A MASSIVELY-PARALLEL FAULT-TOLERANT ARCHITECTURE FOR TIME-CRITICAL COMPUTING

被引:0
|
作者
AHMAD, I
机构
[1] Department of Computer Science, The Hong Kong University of Science and Technology, Kowloon, Clear Water Bay
来源
JOURNAL OF SUPERCOMPUTING | 1995年 / 9卷 / 1-2期
关键词
MASSIVELY PARALLEL SYSTEMS; REAL-TIME SYSTEMS; FAULT TOLERANCE; TASK SCHEDULING; PERFORMANCE EVALUATION;
D O I
10.1007/BF01245401
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called the bisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry small diameter small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, called fault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures.
引用
收藏
页码:135 / 162
页数:28
相关论文
共 50 条
  • [1] A Massively-Parallel, Fault-Tolerant Solver for High-Dimensional PDEs
    Heene, Mario
    Hinojosa, Alfredo Parra
    Bungartz, Hans-Joachim
    Pflueger, Dirk
    EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 635 - 647
  • [2] Fault-Tolerant Network-Server Architecture for Time-Critical Web Applications
    Akpinar, Kutalmis
    Jafariakinabad, Fereshteh
    Hua, Kien A.
    Nakhila, Omar
    Ye, Jun
    Zou, Cliff
    2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 377 - 384
  • [3] Time-Critical Computing on a Single-Chip Massively Parallel Processor
    de Dinechin, Benoit Dupont
    van Amstel, Duco
    Poulhies, Marc
    Lager, Guillaume
    2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
  • [4] PERSPECTIVES IN ADVANCED COMPUTING - AN ARCHITECTURE FOR MASSIVELY-PARALLEL SUPERCOMPUTERS
    BHATKAR, VP
    IETE TECHNICAL REVIEW, 1994, 11 (2-3) : 161 - 168
  • [5] A Fault-tolerant Backbone Network Architecture Targeting Time-critical Communication for Avionic WDM LANs
    Wang, Dexiang
    Kumar, Arvindhan
    Sivakumar, Madhan
    McNair, Janise Y.
    2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, 2009, : 2596 - 2600
  • [7] The flexible hypercube: A new fault-tolerant architecture for parallel computing
    Hameenanttila, T
    Guan, XL
    Carothers, JD
    Chen, JX
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 37 (02) : 213 - 220
  • [8] A FAULT TOLERANT MASSIVELY PARALLEL PROCESSING ARCHITECTURE
    BALASUBRAMANIAN, V
    BANERJEE, P
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1987, 4 (04) : 363 - 383
  • [9] MASSIVELY-PARALLEL COMPUTING IN MATERIALS MODELING
    LOH, E
    PHYSICA D, 1993, 66 (1-2): : 108 - 118
  • [10] From massively parallel image processors to fault-tolerant nanocomputers
    Han, H
    Jonker, P
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 2 - 7