Reducing the burden of parallel loop schedulers for many-core processors

被引:1
|
作者
Arif, Mahwish [1 ]
Vandierendonck, Hans [2 ]
机构
[1] Univ Cambridge, Comp Sci Lab, Cambridge, England
[2] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast, Antrim, North Ireland
来源
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
parallel computing; shared‐ memory synchronization; ALGORITHMS;
D O I
10.1002/cpe.6241
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16%-30% speedup on 48 threads, with a peak of 2.8x speedup.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Graph Reachability on Parallel Many-Core Architectures
    Quer, Stefano
    Calabrese, Andrea
    COMPUTATION, 2020, 8 (04) : 1 - 26
  • [42] The Course of "Parallel Computing" in the Many-core Era
    Wan Han
    Gao Xiaopeng
    Li Yi
    SOCIAL SCIENCE AND EDUCATION, 2013, 10 : 455 - +
  • [43] Multi and many-core computing for parallel metaheuristics
    Melab, Nouredine
    Mezmaz, Mohand
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (09):
  • [44] Parallel Patterns for General Purpose Many-Core
    Buono, Daniele
    Danelutto, Marco
    Lametti, Silvia
    Torquati, Massimo
    PROCEEDINGS OF THE 2013 21ST EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED, AND NETWORK-BASED PROCESSING, 2013, : 131 - 139
  • [45] Optimization of Scan Algorithms on Multi- and Many-core Processors
    Sun, Qiao
    Yang, Chao
    2014 21ST INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2014,
  • [46] Power Multiplexing for Thermal Field Management in Many-Core Processors
    Cho, Minki
    Kersey, Chad
    Gupta, Man Prakash
    Sathe, Nikhil
    Kumar, Satish
    Yalamanchili, Sudhakar
    Mukhopadhyay, Saibal
    IEEE TRANSACTIONS ON COMPONENTS PACKAGING AND MANUFACTURING TECHNOLOGY, 2013, 3 (01): : 94 - 104
  • [47] Online Periodic Test Mechanism for Homogeneous Many-core Processors
    Kamran, Arezoo
    Navabi, Zainalabedin
    2013 IFIP/IEEE 21ST INTERNATIONAL CONFERENCE ON VERY LARGE SCALE INTEGRATION (VLSI-SOC), 2013, : 256 - 259
  • [48] A Semantic Model for Many-Core Parallel Computing
    Zhang, Nan
    Duan, Zhenhua
    COMBINATORIAL OPTIMIZATION AND APPLICATIONS, 2011, 6831 : 464 - 479
  • [49] A many-core based parallel tabu search
    Lam, Yuet M.
    Luk, Wayne
    International Journal of Computers and Applications, 2014, 36 (01) : 15 - 22
  • [50] Tailoring Genetic Algorithm for Resource Scheduling in Many-Core Processors
    Hu, Xiande
    Li, Jingming
    Cheng, Jiaxing
    PROCEEDINGS OF THE 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER ENGINEERING AND ELECTRONICS (ICECEE 2015), 2015, 24 : 465 - 471