Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking

被引：22

作者：

Tagliavini, Giuseppe ^{[1
]}

Cesarini, Daniele ^{[1
]}

Marongiu, Andrea ^{[2
]}

机构：

[1] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, BO, Italy

[2] Univ Bologna, Dept Comp Sci & Engn DISI, I-40126 Bologna, BO, Italy

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2018年 / 29卷 / 09期

基金：

欧盟地平线“2020”;

关键词：

Heterogeneous embedded systems on chip; programmable many-core accelerators; tasking; OpenMp; SYSTEMS; SUPPORT;

D O I：

10.1109/TPDS.2018.2814602

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements. This has increased the urge for programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering high-level abstractions to outline abundant and irregular parallelism in embedded applications. However, efficiently supporting this programming paradigm on embedded PMCAs is challenging, due to the large time and space overheads it introduces. In this paper we describe a lightweight OpenMP tasking runtime environment (RTE) design for a state-of-the-art embedded PMCA, the Kalray MPPA 256. We provide an exhaustive characterization of the costs of our RTE, considering both synthetic workload and real programs, and we compare to several other tasking RTEs. Experimental results confirm that our solution achieves near-ideal parallelization speedups for tasks as small as 5K cycles, and an average speedup of 12 x for real benchmarks, which is approximate to 60% higher than what we observe with the original Kalray OpenMP implementation.

引用

页码：2150 / 2163

页数：14

共 36 条

[1] Thermal Management of a Many-Core Processor under Fine-Grained Parallelism
Keceli, Fuat
Moreshet, Tali
Vishkin, Uzi
EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I, 2012, 7155 : 249 - 259
[2] Study on Fine-grained Synchronization in Many-Core Architecture
Yu, Lei
Liu, Zhiyong
Fan, Dongrui
Song, Fenglong
Zhang, Junchao
Yuan, Nan
SNPD 2009: 10TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCES, NETWORKING AND PARALLEL DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 524 - 529
[3] Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures
Nookala, Poornima
Dinda, Peter
Hale, Kyle C.
Chard, Kyle
Raicu, Ioan
29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 65 - 72
[4] AsAP: A fine-grained many-core platform for DSP applications
Baas, Bevan
Yu, Zhiyi
Meeuwsen, Michael
Sattari, Omar
Apperson, Ryan
Work, Eric
Webb, Jeremy
Lai, Michael
Mohsenin, Tinoosh
Truong, Dean
Cheung, Jason
IEEE MICRO, 2007, 27 (02) : 34 - 45
[5] Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs
Vogel, Pirmin
Marongiu, Andrea
Benini, Luca
2015 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2015, : 45 - 54
[6] Enabling Scalable and Fine-Grained Nested Parallelism on Embedded Many-Cores
Capotondi, Alessandro
Marongiu, Andrea
Benini, Luca
2015 IEEE 9TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SYSTEMS-ON-CHIP (MCSOC), 2015, : 297 - 304
[7] Display Stream Compression Decoders for Fine-Grained Many-Core Processor Arrays
Wu, Shifu
Baas, Bevan M.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (05) : 1730 - 1734
[8] Fine-Grained Energy-Efficient Sorting on a Many-Core Processor Array
Stillmaker, Aaron
Stillmaker, Lucas
Baas, Bevan
PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 652 - 659
[9] FINGERS: Exploiting Fine-Grained Parallelism in Graph Mining Accelerators
Chen, Qihang
Tian, Boyu
Gao, Mingyu
ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 43 - 55
[10] On the Effectiveness of OpenMP Teams for Cluster-Based Many-Core Accelerators
Capotondi, Alessandro
Marongiu, Andrea
2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 667 - 674

← 1 2 3 4 →