Reducing the burden of parallel loop schedulers for many-core processors

被引:1
|
作者
Arif, Mahwish [1 ]
Vandierendonck, Hans [2 ]
机构
[1] Univ Cambridge, Comp Sci Lab, Cambridge, England
[2] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast, Antrim, North Ireland
来源
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
parallel computing; shared‐ memory synchronization; ALGORITHMS;
D O I
10.1002/cpe.6241
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16%-30% speedup on 48 threads, with a peak of 2.8x speedup.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] A Scalable Parallel Partition Tridiagonal Solver for Many-Core and Low B/F Processors
    Mitsuda, Tatsuya
    Ono, Kenji
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 860 - 869
  • [22] Parallel Many-Core Avionics Systems
    Panic, Milos
    Quinones, Eduardo
    Zaykov, Pavel G.
    Hernandez, Carles
    Abella, Jaume
    Cazorla, Francisco J.
    2014 INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE (EMSOFT), 2014,
  • [23] Many-core processors and GPU opportunities in Particle Detectors
    Neufeld, Niko
    Vilasis-Cardona, Xavier
    2012 13TH INTERNATIONAL WORKSHOP ON CELLULAR NANOSCALE NETWORKS AND THEIR APPLICATIONS (CNNA), 2012,
  • [24] Queuing Ports for Mesh Based Many-Core Processors
    Villaescusa D.G.
    Rivas M.A.
    Harbour M.G.
    Ada User Journal, 2021, 42 (3-4): : 189 - 192
  • [25] A Study of an Infrastructure for Research and Development of Many-Core Processors
    Uehara, Koh
    Sato, Shimpei
    Miyoshi, Takefumi
    Kise, Kenji
    2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 414 - 419
  • [26] Threaded Dynamic Memory Management in Many-Core Processors
    Herrmann, Edward C.
    Wilsey, Philip A.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 931 - 936
  • [27] Applications of the Virtual Cellular Machine to Many-core Processors
    Roska, Tamas
    Zarandy, Akos
    Pazienza, Giovanni E.
    2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1536 - 1539
  • [28] Special Issue on Design Challenges for Many-Core Processors
    Daneshtalab, Masoud
    Palesi, Maurizio
    Plosila, Juha
    ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2014, 13
  • [29] Performance of Graph Analytics Applications on Many-Core Processors
    Wise, Jenna
    Lederman, Emily
    Kumar, Manoj
    Pattnaik, Pratap
    2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [30] Analysis of Memory System of Tiled Many-Core Processors
    Liu, Ye
    Kato, Shinpei
    Edahiro, Masato
    IEEE ACCESS, 2019, 7 : 18964 - 18977