A Cluster-Based Implementation of a Fault Tolerant Parallel Reduction Algorithm Using Swarm-Array Computing

被引:2
|
作者
Varghese, Blesson [1 ]
McKee, Gerard [1 ]
Alexandrov, Vassil [1 ]
机构
[1] Univ Reading, Sch Syst Engn, Reading RG6 6AY, Berks, England
关键词
swarm-array computing; intelligent agents; fault-tolerant system; cluster-based implementation;
D O I
10.1109/ICAS.2010.13
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research in multi-agent systems incorporate fault tolerance concepts. However, the research does not explore the extension and implementation of such ideas for large scale parallel computing systems. The work reported in this paper investigates a swarm array computing approach, namely 'Intelligent Agents'. In the approach considered a task to be executed on a parallel computing system is decomposed to sub-tasks and mapped onto agents that traverse an abstracted hardware layer. The agents intercommunicate across processors to share information during the event of a predicted core/processor failure and for successfully completing the task. The agents hence contribute towards fault tolerance and towards building reliable systems. The feasibility of the approach is validated by simulations on an FPGA using a multi-agent simulator and implementation of a parallel reduction algorithm on a computer cluster using the Message Passing Interface.
引用
收藏
页码:30 / 36
页数:7
相关论文
共 50 条
  • [31] Evaluating the DSMIO cache-coherence algorithm in cluster-based parallel ODBMS
    Osthoff, C
    Bentes, C
    Ariosto, D
    Mattoso, M
    Amorim, CL
    OBJECT-ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2002, 2425 : 286 - 297
  • [32] Fault-tolerant cluster-based routing approach in wireless mobile ad hoc networks
    Xu, S
    Papavassiliou, S
    Zakrevski, L
    IEEE 54TH VEHICULAR TECHNOLOGY CONFERENCE, VTC FALL 2001, VOLS 1-4, PROCEEDINGS, 2001, : 2613 - 2617
  • [33] Correctness of fault-tolerant cluster-based beacon vector routing for ad hoc networks
    Demoracski, L
    Avresky, DR
    WiMob 2005: IEEE International Conference on Wireless and Mobile Computing, Networking and Communications, Vol 3, Proceedings, 2005, : 397 - 405
  • [34] PARALLEL IMPLEMENTATION FOR SAM ALGORITHM BASED ON GPU AND DISTRIBUTED COMPUTING
    Qu, Haicheng
    Zhang, Junping
    Chen, Yushi
    Chen, Hao
    Lin, Zhouhan
    2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 4074 - 4077
  • [35] Implementation of a Parallel Algorithm Based on a Spark Cloud Computing Platform
    Wang, Longhui
    Wang, Yong
    Xie, Yudong
    ALGORITHMS, 2015, 8 (03): : 407 - 414
  • [36] Implementation of a Cluster-Based Heterogeneous Edge Computing System for Resource Monitoring and Performance Evaluation
    Chan, Yu-Wei
    Fathoni, Halim
    Yen, Hao-Yi
    Yang, Chao-Tung
    IEEE ACCESS, 2022, 10 : 38458 - 38471
  • [37] A Parallel Implementation of Multiobjective Particle Swarm Optimization Algorithm Based on Decomposition
    Li, Jin-Zhou
    Chen, Wei-Neng
    Zhang, Jun
    Zhan, Zhi-hui
    2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, : 1310 - 1317
  • [38] Detecting anomalies in cluster-based parallel programs using a wavelet based approach
    Liu, Z
    Bridges, SM
    2005 IEEE NETWORKING, SENSING AND CONTROL PROCEEDINGS, 2005, : 348 - 353
  • [39] Using fault injection and modeling to evaluate the performability of cluster-based services
    Nagaraja, K
    Li, XY
    Bianchini, R
    Martin, RP
    Nguyen, TD
    USENIX ASSOCIATION PROCEEDINGS OF THE 4TH USENIX SYMPOSIUM ON INTERNET TECHNOLOGIES AND SYSTEMS (USITS'03), 2003, : 17 - 30
  • [40] Algorithm based fault tolerant and check pointing for high performance computing systems
    University of Isfahan, Isfahan, Iran
    J. Appl. Sci., 2009, 22 (3947-3956): : 3947 - 3956