Scalable Work-Stealing Load-Balancer for HPC Distributed Memory Systems

被引:0
|
作者
Fontenaille, Clement [1 ,2 ]
Petit, Eric [3 ]
Castro, Pablo de Oliveira [1 ]
Uemura, Seijilo [1 ]
Sohier, Devan [1 ]
Lesnicki, Piotr [2 ]
Lartigue, Ghislain [4 ]
Moureau, Vincent [4 ]
机构
[1] Univ Versailles, Li PaRAD, Versailles, France
[2] Atos Bull, Paris, France
[3] Intel Corp, Santa Clara, CA USA
[4] Univ Normandie, CNRS, CORIA, St Etienne Du Rouvray, France
关键词
D O I
10.1007/978-3-030-10549-5_12
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Work-stealing schedulers are common in shared memory environments. However, large scale distributed memory usage has been limited to specific ad-hoc implementations preventing a broader adoption. In this paper we introduce a new scalable work-stealing algorithm for distributed memory systems as well as our implementation as the TITUS DLB library. It is based on Kleinberg's small-world graph. It allows to control the communication patterns and associated runtime overheads while providing efficient heuristics for victim selection and results routing. To validate our approach, we present the DLB Bench benchmark which emulates arbitrary workload distribution and imbalance characteristics. Finally, we compare TITUS DLB to the ad-hoc solution developed for the YALES2 computational fluid dynamics and combustion solver. We achieve up to 54% performance gain over thousands of cores.
引用
收藏
页码:146 / 158
页数:13
相关论文
共 50 条
  • [1] PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization
    Freitas, Vinicius
    Pilla, Laercio L.
    Santana, Alexandre de L.
    Castro, Marcio
    Cohen, Johanne
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 150 : 34 - 45
  • [2] Optimized Distributed Work-Stealing
    Kumar, Vivek
    Murthy, Karthik
    Sarkar, Vivek
    Zheng, Yili
    [J]. PROCEEDINGS OF 2016 6TH WORKSHOP ON IRREGULAR APPLICATIONS: ARCHITECTURE AND ALGORITHMS (IA3), 2016, : 74 - 77
  • [3] Dynamic memory ABP work-stealing
    Hendler, D
    Lev, Y
    Shavit, N
    [J]. DISTRIBUTED COMPUTING, PROCEEDINGS, 2004, 3274 : 188 - 200
  • [4] Using Load Information in Work-Stealing on Distributed Systems with Non-uniform Communication Latencies
    Janjic, Vladimir
    Hammond, Kevin
    [J]. EURO-PAR 2012 PARALLEL PROCESSING, 2012, 7484 : 155 - 166
  • [5] A Dynamic Distributed Deterministic Load-Balancer for Decentralized Hierarchical Infrastructures
    Sioutas, Spyros
    Sourla, Efrosini
    Tsichlas, Kostas
    Vonitsanos, Gerasimos
    Zaroliagis, Christos
    [J]. ALGORITHMS, 2022, 15 (03)
  • [6] Load Balancing Prioritized Tasks via Work-Stealing
    Imam, Shams
    Sarkar, Vivek
    [J]. EURO-PAR 2015: PARALLEL PROCESSING, 2015, 9233 : 222 - 234
  • [7] Asynchronous Work Stealing on Distributed Memory Systems
    Li, Shigang
    Hu, Jingyuan
    Cheng, Xin
    Zhao, Chongchong
    [J]. PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2013, : 198 - 202
  • [8] Distributed-Memory Load Balancing With Cyclic Token-Based Work-Stealing Applied to Reverse Time Migration
    Assis, Italo A. S.
    Oliveira, Antonio D. S.
    Barros, Tiago
    Sardina, Idalmis M.
    Bianchini, Calebe P.
    Xavier-De-Souza, Samuel
    [J]. IEEE ACCESS, 2019, 7 : 128419 - 128430
  • [9] Correct and Efficient Work-Stealing for Weak Memory Models
    Nhat Minh Le
    Pop, Antoniu
    Cohen, Albert
    Nardelli, Francesco Zappa
    [J]. ACM SIGPLAN NOTICES, 2013, 48 (08) : 69 - 79
  • [10] Using Memory Mapping to Support Cactus Stacks in Work-Stealing Runtime Systems
    Lee, I-Ting Angelina
    Boyd-Wickizer, Silas
    Huang, Zhiyi
    Leiserson, Charles E.
    [J]. PACT 2010: PROCEEDINGS OF THE NINETEENTH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2010, : 411 - 420