Reducing the burden of parallel loop schedulers for many-core processors

被引：1

作者：

Arif, Mahwish ^{[1
]}

Vandierendonck, Hans ^{[2
]}

机构：

[1] Univ Cambridge, Comp Sci Lab, Cambridge, England

[2] Queens Univ Belfast, Sch Elect Elect Engn & Comp Sci, Belfast, Antrim, North Ireland

来源：

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE | 2021年 / 33卷 / 13期

基金：

英国工程与自然科学研究理事会; 欧盟地平线“2020”;

关键词：

parallel computing; shared‐ memory synchronization; ALGORITHMS;

D O I：

10.1002/cpe.6241

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

As core counts in processors increases, it becomes harder to schedule and distribute work in a timely and scalable manner. This article enhances the scalability of parallel loop schedulers by specializing schedulers for fine-grain loops. We propose a low-overhead work distribution mechanism for a static scheduler that uses no atomic operations. We integrate our static scheduler with the Intel OpenMP and Cilkplus parallel task schedulers to build hybrid schedulers. Compiler support enables efficient reductions for Cilk, without changing the programming interface of Cilk reducers. Detailed, quantitative measurements demonstrate that our techniques achieve scalable performance on a 48-core machine and the scheduling overhead is 43% lower than Intel OpenMP and 12.1x lower than Cilk. We demonstrate consistent performance improvements on a range of HPC and data analytics codes. Performance gains are more important as loops become finer-grain and thread counts increase. We observe consistently 16%-30% speedup on 48 threads, with a peak of 2.8x speedup.

引用

页数：17

共 50 条

[21] A Scalable Parallel Partition Tridiagonal Solver for Many-Core and Low B/F Processors
Mitsuda, Tatsuya
Ono, Kenji
2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022), 2022, : 860 - 869
[22] Parallel Many-Core Avionics Systems
Panic, Milos
Quinones, Eduardo
Zaykov, Pavel G.
Hernandez, Carles
Abella, Jaume
Cazorla, Francisco J.
2014 INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE (EMSOFT), 2014,
[23] Many-core processors and GPU opportunities in Particle Detectors
Neufeld, Niko
Vilasis-Cardona, Xavier
2012 13TH INTERNATIONAL WORKSHOP ON CELLULAR NANOSCALE NETWORKS AND THEIR APPLICATIONS (CNNA), 2012,
[24] Queuing Ports for Mesh Based Many-Core Processors
Villaescusa D.G.
Rivas M.A.
Harbour M.G.
Ada User Journal, 2021, 42 (3-4): : 189 - 192
[25] A Study of an Infrastructure for Research and Development of Many-Core Processors
Uehara, Koh
Sato, Shimpei
Miyoshi, Takefumi
Kise, Kenji
2009 INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2009), 2009, : 414 - 419
[26] Threaded Dynamic Memory Management in Many-Core Processors
Herrmann, Edward C.
Wilsey, Philip A.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPLEX, INTELLIGENT AND SOFTWARE INTENSIVE SYSTEMS (CISIS 2010), 2010, : 931 - 936
[27] Applications of the Virtual Cellular Machine to Many-core Processors
Roska, Tamas
Zarandy, Akos
Pazienza, Giovanni E.
2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1536 - 1539
[28] Special Issue on Design Challenges for Many-Core Processors
Daneshtalab, Masoud
Palesi, Maurizio
Plosila, Juha
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2014, 13
[29] Performance of Graph Analytics Applications on Many-Core Processors
Wise, Jenna
Lederman, Emily
Kumar, Manoj
Pattnaik, Pratap
2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
[30] Analysis of Memory System of Tiled Many-Core Processors
Liu, Ye
Kato, Shinpei
Edahiro, Masato
IEEE ACCESS, 2019, 7 : 18964 - 18977

← 1 2 3 4 5 →