Fault-Tolerant Global Load Balancing in X10

被引:1
|
作者
Bungart, Marco [1 ]
Fohry, Claudia [1 ]
Posner, Jonas [1 ]
机构
[1] Univ Kassel, Res Grp Programming Languages Methodol, Kassel, Germany
关键词
Resilient X10; task pool; GLB; algorithmic resilience;
D O I
10.1109/SYNASC.2014.69
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scalability postulates fault tolerance to be effective. We consider a user-level fault tolerance technique to cope with permanent node failures. It is supported by X10, one of the major Partitioned Global Address Space (PGAS) languages. In Resilient X10, an exception is thrown when a place (node) fails. This paper investigates task pools, which are often used by irregular applications to balance their load. We consider global load balancing with one worker per place. Each worker maintains a private task pool and supports cooperative work stealing. Tasks may generate new tasks dynamically, are free of side-effects, and their results are combined by reduction. Our first contribution is a task pool algorithm that can handle permanent place failures. It is based on snapshots that are regularly written to other workers and are updated in the event of stealing. Second, we implemented the algorithm in the Global Load Balancing framework GLB, which is part of the standard library of X10. We ran experiments with the Unbalanced Tree Search (UTS) and Betweenness Centrality (BC) benchmarks. With 64 places on 4 nodes, for instance, we observed an overhead of about 4% for using fault-tolerant GLB instead of GLB. The protocol overhead for a place failure was neglectable.
引用
收藏
页码:471 / 478
页数:8
相关论文
共 50 条
  • [31] Fault-tolerant polynomial smoother and fault-tolerant differential smoothers
    Hu, Feng
    Sun, Guoji
    Gongcheng Shuxue Xuebao/Chinese Journal of Engineering Mathematics, 2000, 17 (02): : 53 - 57
  • [32] A reconfigurable fault-tolerant hypercube architecture with global sparing
    Chau, SC
    Fu, AWC
    2000 PACIFIC RIM INTERNATIONAL SYMPOSIUM ON DEPENDABLE COMPUTING, PROCEEDINGS, 2000, : 156 - 165
  • [33] Fault-Tolerant Distributed Feedback Global Chassis Control
    Bajcinca, Naim
    2013 XXIV INTERNATIONAL SYMPOSIUM ON INFORMATION, COMMUNICATION AND AUTOMATION TECHNOLOGIES (ICAT), 2013,
  • [34] CAFT: Congestion-Aware Fault-Tolerant Load Balancing for Three-Tier Clos Data Centers
    Alanazi, Sultan
    Hamdaoui, Bechir
    2020 16TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC, 2020, : 1746 - 1751
  • [35] A localized fault tolerant load balancing algorithm for RFID systems
    Ahnaf Munir
    Md. Tahmid Rahman Laskar
    Md Sakhawat Hossen
    Salimur Choudhury
    Journal of Ambient Intelligence and Humanized Computing, 2019, 10 : 4305 - 4317
  • [36] A localized fault tolerant load balancing algorithm for RFID systems
    Munir, Ahnaf
    Laskar, Md Tahmid Rahman
    Hossen, Md Sakhawat
    Choudhury, Salimur
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 10 (11) : 4305 - 4317
  • [37] Fault Tolerant Multiple Synchronized Parallel Load Balancing in Cloud
    Sreelekshmi, S.
    Babu, K. R. Remesh
    HYBRID INTELLIGENT SYSTEMS, HIS 2017, 2018, 734 : 11 - 19
  • [38] The Load Balance on the Fault Ring Based Fault-Tolerant Routing Scheme in Tori
    Xie, Lingfu
    Xu, Du
    Xu, Shizhong
    2009 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS PROCEEDINGS, VOLUMES I & II: COMMUNICATIONS, NETWORKS AND SIGNAL PROCESSING, VOL I/ELECTRONIC DEVICES, CIRUITS AND SYSTEMS, VOL II, 2009, : 377 - 381
  • [39] LOAD-LEVELING IN FAULT-TOLERANT DISTRIBUTED COMPUTING SYSTEMS
    PATNAIK, LM
    IYER, KV
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1986, 12 (04) : 554 - 560
  • [40] Fault-Tolerant Control for Load Frequency Control System via a Fault Observer
    Zhang, Yiwei
    Xu, Dezhi
    Yang, Weilin
    Bi, Kaitao
    Yan, Wenxu
    PROCEEDINGS OF 2020 IEEE 9TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS'20), 2020, : 855 - 859