Reducing the burden of parallel loop schedulers for many-core processors

被引:1
|
作者
Arif, Mahwish [1 ]
Vandierendonck, Hans [2 ]
机构
[1] Univ Cambridge, Comp Sci Lab, Cambridge, England
[2] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast, Antrim, North Ireland
来源
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
parallel computing; shared‐ memory synchronization; ALGORITHMS;
D O I
10.1002/cpe.6241
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16%-30% speedup on 48 threads, with a peak of 2.8x speedup.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] POSTER: Reducing the Burden of Parallel Loop Schedulers for Many-Core Processors
    Arif, Mahwish
    Vandierendonck, Hans
    ACM SIGPLAN NOTICES, 2018, 53 (01) : 383 - 384
  • [2] Parallel space saving on multi- and many-core processors
    Cafaro, Massimo
    Pulimeno, Marco
    Epicoco, Italo
    Aloisio, Giovanni
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (07):
  • [3] PARALLEL SIMULATION OF MANY-CORE PROCESSORS: INTEGRATION OF RESEARCH AND EDUCATION
    Moreshet, Tali
    Vishkin, Uzi
    Keceli, Fuat
    2012 ASEE ANNUAL CONFERENCE, 2012,
  • [4] Numerical multi-loop integration on heterogeneous many-core processors
    de Doncker, E.
    Yuasa, F.
    Almulihi, A.
    Nakasato, N.
    Daisaka, H.
    Ishikawa, T.
    19TH INTERNATIONAL WORKSHOP ON ADVANCED COMPUTING AND ANALYSIS TECHNIQUES IN PHYSICS RESEARCH, 2020, 1525
  • [5] Highly scalable parallel genetic algorithm on Sunway many-core processors
    Xiao, Zhiyong
    Liu, Xu
    Xu, Jingheng
    Sun, Qingxiao
    Gan, Lin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 114 : 679 - 691
  • [6] Parallel Dense Gauss-Seidel Algorithm on Many-Core Processors
    Courtecuisse, Hadrien
    Allard, Jeremie
    HPCC: 2009 11TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2009, : 139 - 147
  • [7] Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors
    Yan, Chenggang
    Zhang, Yongdong
    Xu, Jizheng
    Dai, Feng
    Zhang, Jun
    Dai, Qionghai
    Wu, Feng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2014, 24 (12) : 2077 - 2089
  • [8] Economic models for many-core processors
    Kumar, Rakesh
    DR DOBBS JOURNAL, 2008, 33 (03): : 10 - 10
  • [9] Parallel simulation of many-core processor and many-core clusters
    Lü, Huiwei
    Cheng, Yuan
    Bai, Lu
    Chen, Mingyu
    Fan, Dongrui
    Sun, Ninghui
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2013, 50 (05): : 1110 - 1117
  • [10] Reducing adaptive optics latency using Xeon Phi many-core processors
    Barr, David
    Basden, Alastair
    Dipper, Nigel
    Schwartz, Noah
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2015, 453 (03) : 3222 - 3233