Improving Cache Utilization of Nested Parallel Programs by Almost Deterministic Work Stealing

被引：0

作者：

Shiina, Shumpei ^{[1
]}

Taura, Kenjiro ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Informat & Commun Engn, Bunkyo Ku, Tokyo 1138654, Japan

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2022年 / 33卷 / 12期

关键词：

Task analysis; Parallel processing; Dynamic scheduling; Decision trees; Program processors; Processor scheduling; Runtime; Dynamic load balancing; locality; nested parallelism; task parallelism; task scheduling; work stealing; CILK;

D O I：

10.1109/TPDS.2022.3196192

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Nested (fork-join) parallelism eases parallel programming by enabling high-level expression of parallelism and leaving the mapping between parallel tasks and hardware to the runtime scheduler. A challenge in dynamic scheduling of nested parallelism is how to exploit data locality, which has become more demanding in the deep cache hierarchies of modern processors with a large number of cores. This paper introduces almost deterministic work stealing (ADWS), which efficiently exploits data locality by deterministically planning a cache-hierarchy-aware schedule, while allowing a little scheduling variety to facilitate dynamic load balancing. Furthermore, we propose an extension of our prior work on ADWS to achieve better shared cache utilization. The improved version of the scheduler is called multi-level ADWS. The idea is that only part of a computation whose working set size is small enough to fit into a shared cache is scheduled by ADWS within the cache recursively, thus avoiding excessive capacity misses. Our evaluation on a benchmark of parallel decision tree construction demonstrated that multi-level ADWS outperformed the conventional random work stealing of Cilk Plus by 61% and it showed a 40% performance improvement over the previous ADWS design.

引用

页码：4530 / 4546

页数：17

共 9 条

[1] Almost Deterministic Work Stealing
Shiina, Shumpei
Taura, Kenjiro
PROCEEDINGS OF SC19: THE INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2019,
[2] Analysis of Work-Stealing and Parallel Cache Complexity
Gu, Yan
Napier, Zachary
Sun, Yihan
SYMPOSIUM ON ALGORITHMIC PRINCIPLES OF COMPUTER SYSTEMS, APOCS, 2022, : 46 - 60
[3] A Work Stealing Scheduler for Parallel Loops on Shared Cache Multicores
Tchiboukdjian, Marc
Danjean, Vincent
Gautier, Thierry
Le Mentec, Fabien
Raffin, Bruno
EURO-PAR 2010 PARALLEL PROCESSING WORKSHOPS, 2011, 6586 : 99 - 107
[4] Scheduling Parallel Programs by Work Stealing with Private Deques
Acar, Umut A.
Chargueraud, Arthur
Rainey, Mike
ACM SIGPLAN NOTICES, 2013, 48 (08) : 219 - 228
[5] Beyond Nested Parallelism: Tight Bounds on Work-Stealing Overheads for Parallel Futures
Spoonhower, Daniel
Blelloch, Guy E.
Gibbons, Phillip B.
Harper, Robert
SPAA'09: PROCEEDINGS OF THE TWENTY-FIRST ANNUAL SYMPOSIUM ON PARALLELISM IN ALGORITHMS AND ARCHITECTURES, 2009, : 91 - 100
[6] Task-Level Checkpointing for Nested Fork-Join Programs Using Work Stealing
Reitz, Lukas
Fohry, Claudia
EURO-PAR 2023: PARALLEL PROCESSING WORKSHOPS, PT II, EURO-PAR 2023, 2024, 14352 : 102 - 114
[7] Work stealing for GPU-accelerated parallel programs in a global address space framework
Arafat, Humayun
Dinan, James
Krishnamoorthy, Sriram
Balaji, Pavan
Sadayappan, P.
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2016, 28 (13): : 3637 - 3654
[8] Performance evaluation on work-stealing featured parallel programs on asymmetric performance multicore processors?
Adnan
ARRAY, 2023, 19
[9] Evolving Cut-Off Mechanisms and Other Work-Stealing Parameters for Parallel Programs
Fonseca, Alcides
Lourenco, Nuno
Cabral, Bruno
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 : 757 - 772

← 1 →