Fault-Tolerant Global Load Balancing in X10

被引:1
|
作者
Bungart, Marco [1 ]
Fohry, Claudia [1 ]
Posner, Jonas [1 ]
机构
[1] Univ Kassel, Res Grp Programming Languages Methodol, Kassel, Germany
关键词
Resilient X10; task pool; GLB; algorithmic resilience;
D O I
10.1109/SYNASC.2014.69
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scalability postulates fault tolerance to be effective. We consider a user-level fault tolerance technique to cope with permanent node failures. It is supported by X10, one of the major Partitioned Global Address Space (PGAS) languages. In Resilient X10, an exception is thrown when a place (node) fails. This paper investigates task pools, which are often used by irregular applications to balance their load. We consider global load balancing with one worker per place. Each worker maintains a private task pool and supports cooperative work stealing. Tasks may generate new tasks dynamically, are free of side-effects, and their results are combined by reduction. Our first contribution is a task pool algorithm that can handle permanent place failures. It is based on snapshots that are regularly written to other workers and are updated in the event of stealing. Second, we implemented the algorithm in the Global Load Balancing framework GLB, which is part of the standard library of X10. We ran experiments with the Unbalanced Tree Search (UTS) and Betweenness Centrality (BC) benchmarks. With 64 places on 4 nodes, for instance, we observed an overhead of about 4% for using fault-tolerant GLB instead of GLB. The protocol overhead for a place failure was neglectable.
引用
收藏
页码:471 / 478
页数:8
相关论文
共 50 条
  • [1] FAULT TOLERANCE SCHEMES FOR GLOBAL LOAD BALANCING IN X10
    Fohry, Claudia
    Bungart, Marco
    Posner, Jonas
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2015, 16 (02): : 169 - 185
  • [2] A Malleable and Fault-Tolerant Task Pool Framework for X10
    Bungart, Marco
    Fohry, Claudia
    2017 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2017, : 749 - 757
  • [3] Fault-tolerant, load-balancing queries in telegraph
    Shah, MA
    Chandrasekaran, S
    SIGMOD RECORD, 2001, 30 (02) : 611 - 611
  • [4] Fault-Tolerant Routing With Load Balancing in LeTQ Networks
    Fan, Weibei
    Xiao, Fu
    Fan, Jianxi
    Han, Zhijie
    Sun, Lijuan
    Wang, Ruchuan
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (01) : 68 - 82
  • [5] Fault-Tolerant Load Balancing in Cloud Computing: A Systematic Literature Review
    Mohammadian, Vahid
    Navimipour, Nima Jafari
    Hosseinzadeh, Mehdi
    Darwesh, Aso
    IEEE ACCESS, 2022, 10 : 12714 - 12731
  • [6] Economical and Fault-Tolerant Load Balancing in Distributed Stream Processing Systems
    Xiao, Fuyuan
    Kitasuka, Teruaki
    Aritsugi, Masayoshi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2012, E95D (04): : 1062 - 1073
  • [7] Designing a fault-tolerant network using valiant load-balancing
    Zhang-Shen, Rui
    McKeown, Nick
    27TH IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), VOLS 1-5, 2008, : 301 - 305
  • [8] Balancing Workload and Recovery Load on Distributed Fault-Tolerant VOD Systems
    Shyu, Ing-Jye
    Shieh, Shiuh-Pyng
    IEEE COMMUNICATIONS LETTERS, 1998, 2 (10) : 288 - 290
  • [9] Fault-tolerant and load balancing localization of services in wireless sensor networks
    Nidito, Francesco
    Battelli, Michele
    Basagni, Stefano
    2007 IEEE 66TH VEHICULAR TECHNOLOGY CONFERENCE, VOLS 1-5, 2007, : 382 - +
  • [10] A fault-tolerant load-balancing protocol for distributed multiserver queuing systems
    Kostin, A
    Oz, G
    EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON COMPUTERS AND COMMUNICATION, VOLS I AND II, PROCEEDINGS, 2003, : 1201 - 1206