Improving Cache Utilization of Nested Parallel Programs by Almost Deterministic Work Stealing

被引:0
|
作者
Shiina, Shumpei [1 ]
Taura, Kenjiro [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Bunkyo Ku, Tokyo 1138654, Japan
关键词
Task analysis; Parallel processing; Dynamic scheduling; Decision trees; Program processors; Processor scheduling; Runtime; Dynamic load balancing; locality; nested parallelism; task parallelism; task scheduling; work stealing; CILK;
D O I
10.1109/TPDS.2022.3196192
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Nested (fork-join) parallelism eases parallel programming by enabling high-level expression of parallelism and leaving the mapping between parallel tasks and hardware to the runtime scheduler. A challenge in dynamic scheduling of nested parallelism is how to exploit data locality, which has become more demanding in the deep cache hierarchies of modern processors with a large number of cores. This paper introduces almost deterministic work stealing (ADWS), which efficiently exploits data locality by deterministically planning a cache-hierarchy-aware schedule, while allowing a little scheduling variety to facilitate dynamic load balancing. Furthermore, we propose an extension of our prior work on ADWS to achieve better shared cache utilization. The improved version of the scheduler is called multi-level ADWS. The idea is that only part of a computation whose working set size is small enough to fit into a shared cache is scheduled by ADWS within the cache recursively, thus avoiding excessive capacity misses. Our evaluation on a benchmark of parallel decision tree construction demonstrated that multi-level ADWS outperformed the conventional random work stealing of Cilk Plus by 61% and it showed a 40% performance improvement over the previous ADWS design.
引用
收藏
页码:4530 / 4546
页数:17
相关论文
共 9 条
  • [1] Almost Deterministic Work Stealing
    Shiina, Shumpei
    Taura, Kenjiro
    PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
  • [2] Analysis of Work-Stealing and Parallel Cache Complexity
    Gu, Yan
    Napier, Zachary
    Sun, Yihan
    SYMPOSIUM ON ALGORITHMIC PRINCIPLES OF COMPUTER SYSTEMS, APOCS, 2022, : 46 - 60
  • [3] A Work Stealing Scheduler for Parallel Loops on Shared Cache Multicores
    Tchiboukdjian, Marc
    Danjean, Vincent
    Gautier, Thierry
    Le Mentec, Fabien
    Raffin, Bruno
    EURO-PAR 2010 PARALLEL PROCESSING WORKSHOPS, 2011, 6586 : 99 - 107
  • [4] Scheduling Parallel Programs by Work Stealing with Private Deques
    Acar, Umut A.
    Chargueraud, Arthur
    Rainey, Mike
    ACM SIGPLAN NOTICES, 2013, 48 (08) : 219 - 228
  • [5] Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures
    Spoonhower, Daniel
    Blelloch, Guy E.
    Gibbons, Phillip B.
    Harper, Robert
    SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 91 - 100
  • [6] Task-Level Checkpointing for Nested Fork-Join Programs Using Work Stealing
    Reitz, Lukas
    Fohry, Claudia
    EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT II, EURO-PAR 2023, 2024, 14352 : 102 - 114
  • [7] Work stealing for GPU-accelerated parallel programs in a global address space framework
    Arafat, Humayun
    Dinan, James
    Krishnamoorthy, Sriram
    Balaji, Pavan
    Sadayappan, P.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (13): : 3637 - 3654
  • [8] Performance evaluation on work-stealing featured parallel programs on asymmetric performance multicore processors?
    Adnan
    ARRAY, 2023, 19
  • [9] Evolving Cut-Off Mechanisms and Other Work-Stealing Parameters for Parallel Programs
    Fonseca, Alcides
    Lourenco, Nuno
    Cabral, Bruno
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 757 - 772