Fault-Tolerant Global Load Balancing in X10

被引:1
|
作者
Bungart, Marco [1 ]
Fohry, Claudia [1 ]
Posner, Jonas [1 ]
机构
[1] Univ Kassel, Res Grp Programming Languages Methodol, Kassel, Germany
关键词
Resilient X10; task pool; GLB; algorithmic resilience;
D O I
10.1109/SYNASC.2014.69
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Scalability postulates fault tolerance to be effective. We consider a user-level fault tolerance technique to cope with permanent node failures. It is supported by X10, one of the major Partitioned Global Address Space (PGAS) languages. In Resilient X10, an exception is thrown when a place (node) fails. This paper investigates task pools, which are often used by irregular applications to balance their load. We consider global load balancing with one worker per place. Each worker maintains a private task pool and supports cooperative work stealing. Tasks may generate new tasks dynamically, are free of side-effects, and their results are combined by reduction. Our first contribution is a task pool algorithm that can handle permanent place failures. It is based on snapshots that are regularly written to other workers and are updated in the event of stealing. Second, we implemented the algorithm in the Global Load Balancing framework GLB, which is part of the standard library of X10. We ran experiments with the Unbalanced Tree Search (UTS) and Betweenness Centrality (BC) benchmarks. With 64 places on 4 nodes, for instance, we observed an overhead of about 4% for using fault-tolerant GLB instead of GLB. The protocol overhead for a place failure was neglectable.
引用
收藏
页码:471 / 478
页数:8
相关论文
共 50 条
  • [21] Oddlab: fault-tolerant aware load-balancing framework for data center networks
    Alawadi, Aymen Hasan
    Molnar, Sandor
    ANNALS OF TELECOMMUNICATIONS, 2022, 77 (9-10) : 641 - 662
  • [23] A Load-Balancing and State-Sharing Algorithm for Fault-Tolerant Firewall Cluster
    Peng Zhichao
    Chen Daiwu
    He Wenhua
    2017 4TH INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CONTROL ENGINEERING (ICISCE), 2017, : 34 - 37
  • [24] Abstractions for fault-tolerant global computing
    Chothia, T
    Duggan, D
    THEORETICAL COMPUTER SCIENCE, 2004, 322 (03) : 567 - 613
  • [25] Load balancing in fault tolerant video server
    Sujatha, D. N.
    Girish, K.
    Rashmi, B.
    Venugopal, K. R.
    Patnaik, L. M.
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007, 2007, 4881 : 306 - +
  • [26] Fault-tolerant multicast with traffic-balancing in hypercubes
    Shen, H
    SECOND INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS, AND NETWORKS (I-SPAN '96), PROCEEDINGS, 1996, : 415 - 421
  • [27] Analysis of Proactive Models of Fault-Tolerant Routing under Load Balancing and Border Routers Availability
    Lemeshko, Oleksandr
    Yeremenko, Oleksandra
    Mersni, Amal
    Yevdokymenko, Maryna
    Persikov, Mykhailo
    Kruhlova, Anastasiia
    2023 17TH INTERNATIONAL CONFERENCE ON THE EXPERIENCE OF DESIGNING AND APPLICATION OF CAD SYSTEMS, CADSM, 2023,
  • [28] A Load-balancing and Fault-tolerant based Route Planning Algorithm for Wireless Sensor Networks
    Zhu, Hong
    Li, Qiusheng
    Shao, Mingchi
    Wei, Lei
    Lin, Peng
    2018 IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION ENGINEERING (ICECE 2018), 2018, : 81 - 85
  • [29] A load-balancing-based fault-tolerant mapping method in smart grid virtual networks
    Sun, Li-Qian
    Guo, Shao-Yong
    Xu, Si-Ya
    Liu, Zhu
    Wei, Lei
    2016 18TH ASIA-PACIFIC NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM (APNOMS), 2016,
  • [30] Distributed Cost-Aware Fault-Tolerant Load Balancing in Geo-Distributed Data Centers
    Tripathi, Rakesh
    Sivaraman, Vignesh
    Tamarapalli, Venkatesh
    IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2022, 6 (01): : 472 - 483