Unleashing Fine-Grained Parallelism on Embedded Many-Core Accelerators with Lightweight OpenMP Tasking

被引:22
|
作者
Tagliavini, Giuseppe [1 ]
Cesarini, Daniele [1 ]
Marongiu, Andrea [2 ]
机构
[1] Univ Bologna, Dept Elect Elect & Informat Engn DEI, I-40126 Bologna, BO, Italy
[2] Univ Bologna, Dept Comp Sci & Engn DISI, I-40126 Bologna, BO, Italy
基金
欧盟地平线“2020”;
关键词
Heterogeneous embedded systems on chip; programmable many-core accelerators; tasking; OpenMp; SYSTEMS; SUPPORT;
D O I
10.1109/TPDS.2018.2814602
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In recent years, programmable many-core accelerators (PMCAs) have been introduced in embedded systems to satisfy stringent performance/Watt requirements. This has increased the urge for programming models capable of effectively leveraging hundreds to thousands of processors. Task-based parallelism has the potential to provide such capabilities, offering high-level abstractions to outline abundant and irregular parallelism in embedded applications. However, efficiently supporting this programming paradigm on embedded PMCAs is challenging, due to the large time and space overheads it introduces. In this paper we describe a lightweight OpenMP tasking runtime environment (RTE) design for a state-of-the-art embedded PMCA, the Kalray MPPA 256. We provide an exhaustive characterization of the costs of our RTE, considering both synthetic workload and real programs, and we compare to several other tasking RTEs. Experimental results confirm that our solution achieves near-ideal parallelization speedups for tasks as small as 5K cycles, and an average speedup of 12 x for real benchmarks, which is approximate to 60% higher than what we observe with the original Kalray OpenMP implementation.
引用
收藏
页码:2150 / 2163
页数:14
相关论文
共 36 条
  • [1] Thermal Management of a Many-Core Processor under Fine-Grained Parallelism
    Keceli, Fuat
    Moreshet, Tali
    Vishkin, Uzi
    EURO-PAR 2011: PARALLEL PROCESSING WORKSHOPS, PT I, 2012, 7155 : 249 - 259
  • [2] Study on Fine-grained Synchronization in Many-Core Architecture
    Yu, Lei
    Liu, Zhiyong
    Fan, Dongrui
    Song, Fenglong
    Zhang, Junchao
    Yuan, Nan
    SNPD 2009: 10TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCES, NETWORKING AND PARALLEL DISTRIBUTED COMPUTING, PROCEEDINGS, 2009, : 524 - 529
  • [3] Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures
    Nookala, Poornima
    Dinda, Peter
    Hale, Kyle C.
    Chard, Kyle
    Raicu, Ioan
    29TH INTERNATIONAL SYMPOSIUM ON THE MODELING, ANALYSIS, AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS (MASCOTS 2021), 2021, : 65 - 72
  • [4] AsAP: A fine-grained many-core platform for DSP applications
    Baas, Bevan
    Yu, Zhiyi
    Meeuwsen, Michael
    Sattari, Omar
    Apperson, Ryan
    Work, Eric
    Webb, Jeremy
    Lai, Michael
    Mohsenin, Tinoosh
    Truong, Dean
    Cheung, Jason
    IEEE MICRO, 2007, 27 (02) : 34 - 45
  • [5] Lightweight Virtual Memory Support for Many-Core Accelerators in Heterogeneous Embedded SoCs
    Vogel, Pirmin
    Marongiu, Andrea
    Benini, Luca
    2015 INTERNATIONAL CONFERENCE ON HARDWARE/SOFTWARE CODESIGN AND SYSTEM SYNTHESIS (CODES+ISSS), 2015, : 45 - 54
  • [6] Enabling Scalable and Fine-Grained Nested Parallelism on Embedded Many-Cores
    Capotondi, Alessandro
    Marongiu, Andrea
    Benini, Luca
    2015 IEEE 9TH INTERNATIONAL SYMPOSIUM ON EMBEDDED MULTICORE/MANYCORE SYSTEMS-ON-CHIP (MCSOC), 2015, : 297 - 304
  • [7] Display Stream Compression Decoders for Fine-Grained Many-Core Processor Arrays
    Wu, Shifu
    Baas, Bevan M.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (05) : 1730 - 1734
  • [8] Fine-Grained Energy-Efficient Sorting on a Many-Core Processor Array
    Stillmaker, Aaron
    Stillmaker, Lucas
    Baas, Bevan
    PROCEEDINGS OF THE 2012 IEEE 18TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS 2012), 2012, : 652 - 659
  • [9] FINGERS: Exploiting Fine-Grained Parallelism in Graph Mining Accelerators
    Chen, Qihang
    Tian, Boyu
    Gao, Mingyu
    ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, 2022, : 43 - 55
  • [10] On the Effectiveness of OpenMP Teams for Cluster-Based Many-Core Accelerators
    Capotondi, Alessandro
    Marongiu, Andrea
    2016 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING & SIMULATION (HPCS 2016), 2016, : 667 - 674