Reducing the burden of parallel loop schedulers for many-core processors

被引：1

作者：

Arif, Mahwish ^{[1
]}

Vandierendonck, Hans ^{[2
]}

机构：

[1] Univ Cambridge, Comp Sci Lab, Cambridge, England

[2] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast, Antrim, North Ireland

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2021年 / 33卷 / 13期

基金：

英国工程与自然科学研究理事会; 欧盟地平线“2020”;

关键词：

parallel computing; shared‐ memory synchronization; ALGORITHMS;

D O I：

10.1002/cpe.6241

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16%-30% speedup on 48 threads, with a peak of 2.8x speedup.

引用

页数：17

共 50 条

[1] POSTER: Reducing the Burden of Parallel Loop Schedulers for Many-Core Processors
Arif, Mahwish
Vandierendonck, Hans
ACM SIGPLAN NOTICES, 2018, 53 (01) : 383 - 384
[2] Parallel space saving on multi- and many-core processors
Cafaro, Massimo
Pulimeno, Marco
Epicoco, Italo
Aloisio, Giovanni
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (07):
[3] PARALLEL SIMULATION OF MANY-CORE PROCESSORS: INTEGRATION OF RESEARCH AND EDUCATION
Moreshet, Tali
Vishkin, Uzi
Keceli, Fuat
2012 ASEE ANNUAL CONFERENCE, 2012,
[4] Numerical multi-loop integration on heterogeneous many-core processors
de Doncker, E.
Yuasa, F.
Almulihi, A.
Nakasato, N.
Daisaka, H.
Ishikawa, T.
19TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH, 2020, 1525
[5] Highly scalable parallel genetic algorithm on Sunway many-core processors
Xiao, Zhiyong
Liu, Xu
Xu, Jingheng
Sun, Qingxiao
Gan, Lin
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 679 - 691
[6] Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors
Courtecuisse, Hadrien
Allard, Jeremie
HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 139 - 147
[7] Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors
Yan, Chenggang
Zhang, Yongdong
Xu, Jizheng
Dai, Feng
Zhang, Jun
Dai, Qionghai
Wu, Feng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (12) : 2077 - 2089
[8] Economic models for many-core processors
Kumar, Rakesh
DR DOBBS JOURNAL, 2008, 33 (03): : 10 - 10
[9] Parallel simulation of many-core processor and many-core clusters
Lü, Huiwei
Cheng, Yuan
Bai, Lu
Chen, Mingyu
Fan, Dongrui
Sun, Ninghui
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2013, 50 (05): : 1110 - 1117
[10] Reducing adaptive optics latency using Xeon Phi many-core processors
Barr, David
Basden, Alastair
Dipper, Nigel
Schwartz, Noah
MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2015, 453 (03) : 3222 - 3233

← 1 2 3 4 5 →