Reducing the burden of parallel loop schedulers for many-core processors

被引：1

作者：

Arif, Mahwish ^{[1
]}

Vandierendonck, Hans ^{[2
]}

机构：

[1] Univ Cambridge, Comp Sci Lab, Cambridge, England

[2] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast, Antrim, North Ireland

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2021年 / 33卷 / 13期

基金：

英国工程与自然科学研究理事会; 欧盟地平线“2020”;

关键词：

parallel computing; shared‐ memory synchronization; ALGORITHMS;

D O I：

10.1002/cpe.6241

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16%-30% speedup on 48 threads, with a peak of 2.8x speedup.

引用

页数：17

共 50 条

[41] Graph Reachability on Parallel Many-Core Architectures
Quer, Stefano
Calabrese, Andrea
COMPUTATION, 2020, 8 (04) : 1 - 26
[42] The Course of "Parallel Computing" in the Many-core Era
Wan Han
Gao Xiaopeng
Li Yi
SOCIAL SCIENCE AND EDUCATION, 2013, 10 : 455 - +
[43] Multi and many-core computing for parallel metaheuristics
Melab, Nouredine
Mezmaz, Mohand
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (09):
[44] Parallel Patterns for General Purpose Many-Core
Buono, Daniele
Danelutto, Marco
Lametti, Silvia
Torquati, Massimo
PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2013, : 131 - 139
[45] Optimization of Scan Algorithms on Multi- and Many-core Processors
Sun, Qiao
Yang, Chao
2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
[46] Power Multiplexing for Thermal Field Management in Many-Core Processors
Cho, Minki
Kersey, Chad
Gupta, Man Prakash
Sathe, Nikhil
Kumar, Satish
Yalamanchili, Sudhakar
Mukhopadhyay, Saibal
IEEE TRANSACTIONS ON COMPONENTS PACKAGING AND MANUFACTURING TECHNOLOGY, 2013, 3 (01): : 94 - 104
[47] Online Periodic Test Mechanism for Homogeneous Many-core Processors
Kamran, Arezoo
Navabi, Zainalabedin
2013 IFIP/IEEE 21ST INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2013, : 256 - 259
[48] A Semantic Model for Many-Core Parallel Computing
Zhang, Nan
Duan, Zhenhua
COMBINATORIAL OPTIMIZATION AND APPLICATIONS, 2011, 6831 : 464 - 479
[49] A many-core based parallel tabu search
Lam, Yuet M.
Luk, Wayne
International Journal of Computers and Applications, 2014, 36 (01) : 15 - 22
[50] Tailoring Genetic Algorithm for Resource Scheduling in Many-Core Processors
Hu, Xiande
Li, Jingming
Cheng, Jiaxing
PROCEEDINGS OF THE 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER ENGINEERING AND ELECTRONICS (ICECEE 2015), 2015, 24 : 465 - 471

← 1 2 3 4 5 →