A Cluster-Based Implementation of a Fault Tolerant Parallel Reduction Algorithm Using Swarm-Array Computing

被引:2
|
作者
Varghese, Blesson [1 ]
McKee, Gerard [1 ]
Alexandrov, Vassil [1 ]
机构
[1] Univ Reading, Sch Syst Engn, Reading RG6 6AY, Berks, England
关键词
swarm-array computing; intelligent agents; fault-tolerant system; cluster-based implementation;
D O I
10.1109/ICAS.2010.13
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
引用
收藏
页码:30 / 36
页数:7
相关论文
共 50 条
  • [1] Building Reliable Systems for Space Applications using Swarm-Array Computing
    Varghese, Blesson
    McKee, Gerard
    2009 COMPUTATION WORLD: FUTURE COMPUTING, SERVICE COMPUTATION, COGNITIVE, ADAPTIVE, CONTENT, PATTERNS, 2009, : 527 - 532
  • [2] Explorations of the implementation of a parallel IDW interpolation algorithm in a Linux cluster-based parallel GIS
    Huang, Fang
    Liu, Dingsheng
    Tan, Xicheng
    Wang, Jian
    Chen, Yunping
    He, Binbin
    COMPUTERS & GEOSCIENCES, 2011, 37 (04) : 426 - 434
  • [3] Demystifying Cluster-Based Fault-Tolerant Firewalls
    Neira Ayuso, Pablo
    Gasca, Rafael M.
    Lefevre, Laurent
    IEEE INTERNET COMPUTING, 2009, 13 (06) : 31 - 38
  • [4] FTCM: Fault-tolerant cluster management for cluster-based DBMS
    Chang, Jae-Woo
    Kim, Young-Chang
    AUTONOMIC AND TRUSTED COMPUTING, PROCEEDINGS, 2006, 4158 : 561 - 570
  • [5] Cluster-Based Implementation of a Morphological Watershed Algorithm for Parallel Classification of Multichannel Images
    Plaza, Antonio J.
    NINTH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING, PROCEEDINGS, 2007, : 298 - 303
  • [6] ON DESIRABLE FAULT-TOLERANT TOPOLOGY FOR CLUSTER-BASED NETWORK
    ISHIDA, K
    KIKUNO, T
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1994, E77A (10) : 1617 - 1622
  • [7] Cluster-based architecture for fault-tolerant quantum computation
    Fujii, Keisuke
    Yamamoto, Katsuji
    PHYSICAL REVIEW A, 2010, 81 (04):
  • [8] A fault-tolerant computing method for Xdraw parallel algorithm
    Wanfeng Dou
    Yanan Li
    The Journal of Supercomputing, 2018, 74 : 2776 - 2800
  • [9] A fault-tolerant computing method for Xdraw parallel algorithm
    Dou, Wanfeng
    Li, Yanan
    JOURNAL OF SUPERCOMPUTING, 2018, 74 (06): : 2776 - 2800
  • [10] A Fault-Tolerant and Energy-Aware Mechanism for Cluster-based Routing Algorithm of WSNs
    Hezaveh, Maryam
    Shirmohammdi, Zahra
    Rohbani, Nezam
    Miremadi, Seyed Ghassem
    PROCEEDINGS OF THE 2015 IFIP/IEEE INTERNATIONAL SYMPOSIUM ON INTEGRATED NETWORK MANAGEMENT (IM), 2015, : 659 - 664