A MASSIVELY-PARALLEL FAULT-TOLERANT ARCHITECTURE FOR TIME-CRITICAL COMPUTING

被引：0

作者：

AHMAD, I

机构：

[1] Department of Computer Science, The Hong Kong University of Science and Technology, Kowloon, Clear Water Bay

来源：

JOURNAL OF SUPERCOMPUTING | 1995年 / 9卷 / 1-2期

关键词：

MASSIVELY PARALLEL SYSTEMS; REAL-TIME SYSTEMS; FAULT TOLERANCE; TASK SCHEDULING; PERFORMANCE EVALUATION;

D O I：

10.1007/BF01245401

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Building large-scale parallel computer systems for time-critical applications is a challenging task since the designers of such systems need to consider a number of related factors such as proper support for fault tolerance, efficient task allocation and reallocation strategies, and scalability. In this paper we propose a massively parallel fault-tolerant architecture using hundreds or thousands of processors for critical applications with timing constraints. The proposed architecture is based on an interconnection network called the bisectional network. A bisectional network is isomorphic to a hypercube in that a binary hypercube network can be easily extended as a bisectional network by adding additional links. These additional links add to the network some rich topological properties such as node symmetry small diameter small internode distance, and partitionability. The important property of partitioning is exploited to propose a redundant task allocation and a task redistribution strategy under realtime constraints. The system is partitioned into symmetric regions (spheres) such that each sphere has a central control point. The central points, called fault control points (FCPs), are distributed throughout the entire system in an optimal fashion and provide two-level task redundancy and efficiently redistribute the loads of failed nodes. FCPs are assigned to the processing nodes such that each node is assigned two types of FCPs for storing two redundant copies of every task present at the node. Similarly, the number of nodes assigned to each FCP is the same. For a failure-repair system environment the performance of the proposed system has been evaluated and compared with a hypercube-based system. Simulation results indicate that the proposed system can yield improved performance in the presence of a high number of node failures.

引用

页码：135 / 162

页数：28

共 50 条

[1] A Massively-Parallel, Fault-Tolerant Solver for High-Dimensional PDEs
Heene, Mario
Hinojosa, Alfredo Parra
Bungartz, Hans-Joachim
Pflueger, Dirk
EURO-PAR 2016: PARALLEL PROCESSING WORKSHOPS, 2017, 10104 : 635 - 647
[2] Fault-Tolerant Network-Server Architecture for Time-Critical Web Applications
Akpinar, Kutalmis
Jafariakinabad, Fereshteh
Hua, Kien A.
Nakhila, Omar
Ye, Jun
Zou, Cliff
2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 377 - 384
[3] Time-Critical Computing on a Single-Chip Massively Parallel Processor
de Dinechin, Benoit Dupont
van Amstel, Duco
Poulhies, Marc
Lager, Guillaume
2014 DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION (DATE), 2014,
[4] PERSPECTIVES IN ADVANCED COMPUTING - AN ARCHITECTURE FOR MASSIVELY-PARALLEL SUPERCOMPUTERS
BHATKAR, VP
IETE TECHNICAL REVIEW, 1994, 11 (2-3) : 161 - 168
[5] A Fault-tolerant Backbone Network Architecture Targeting Time-critical Communication for Avionic WDM LANs
Wang, Dexiang
Kumar, Arvindhan
Sivakumar, Madhan
McNair, Janise Y.
2009 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-8, 2009, : 2596 - 2600
[6] The Flexible Hypercube: A New Fault-Tolerant Architecture for Parallel Computing
J Parallel Distrib Comput, 2 (213):
[7] The flexible hypercube: A new fault-tolerant architecture for parallel computing
Hameenanttila, T
Guan, XL
Carothers, JD
Chen, JX
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1996, 37 (02) : 213 - 220
[8] A FAULT TOLERANT MASSIVELY PARALLEL PROCESSING ARCHITECTURE
BALASUBRAMANIAN, V
BANERJEE, P
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1987, 4 (04) : 363 - 383
[9] MASSIVELY-PARALLEL COMPUTING IN MATERIALS MODELING
LOH, E
PHYSICA D, 1993, 66 (1-2): : 108 - 118
[10] From massively parallel image processors to fault-tolerant nanocomputers
Han, H
Jonker, P
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 2 - 7

← 1 2 3 4 5 →