TaskStream: Accelerating Task-Parallel Workloads by Recovering Program Structure

被引：10

作者：

Dadu, Vidushi ^{[1
]}

Nowatzki, Tony ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA

来源：

ASPLOS '22: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS | 2022年

关键词：

Irregularity; tasks; load-balance; accelerators; generality; dataflow; reconfigurable; streaming; LOCALITY;

D O I：

10.1145/3503222.3507706

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reconfigurable accelerators, like CGRAs and dataflow architectures, have come to prominence for addressing data-processing problems. However, they are largely limited to workloads with regular parallelism, precluding their applicability to prevalent task-parallel workloads. Reconfigurable architectures and task parallelism seem to be at odds, as the former requires repetitive and simple program structure, and the latter breaks program structure to create small, individually scheduled program units. Our insight is that if tasks and their potential for communication structure are first-class primitives in the hardware, it is possible to recover program structure with extremely low overhead. We propose a task execution model for accelerators called TaskStream, which annotates task dependences with information sufficient to recover inter-task structure. TaskStream enables work-aware load balancing, recovery of pipelined inter-task dependences, and recovery of inter-task read sharing through multicasting. We apply TaskStream to a reconfigurable dataflow architecture, creating a seamless hierarchical dataflow model for task-parallel workloads. We compare our accelerator, Delta, with an equivalent static-parallel design. Overall, we find that our execution model can improve performance by 2.2x with only 3.6% area overhead, while alleviating the programming burden of managing task distribution.

引用

页码：1 / 13

页数：13

共 50 条

[21] Locality-Aware Task-Parallel Execution on GPUs
Hbeika, Jad
Kulkarni, Milind
[J]. LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2016, 2017, 10136 : 250 - 264
[22] Unordered Task-Parallel Augmented Merge Tree Construction
Werner, Kilian
Garth, Christoph
[J]. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (08) : 3585 - 3596
[23] Atos: A Task-Parallel GPU Scheduler for Graph Analytics
Chen, Yuxin
Brock, Benjamin
Porumbescu, Serban
Buluc, Aydin
Yelick, Katherine
Owens, John D.
[J]. 51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
[24] Design of a Task-Parallel Version of ILUPACK for Graphics Processors
Aliaga, Jose I.
Dufrechou, Ernesto
Ezzatti, Pablo
Quintana-Orti, Enrique S.
[J]. HIGH PERFORMANCE COMPUTING CARLA 2016, 2017, 697 : 91 - 103
[25] An Elasticity Description Language for Task-parallel Cloud Applications
Haussmann, Jens
Blochinger, Wolfgang
Kuechlin, Wolfgang
[J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE (CLOSER), 2020, : 473 - 481
[26] Scalable Task-Parallel SGD on Matrix Factorization in Multicore Architectures
Nishioka, Yusuke
Taura, Kenjiro
[J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, 2015, : 1178 - 1184
[27] Extracting SIMD Parallelism from Recursive Task-Parallel Programs
Ren, Bin
Balakrishna, Shruthi
Jo, Youngjoon
Krishnamoorthy, Sriram
Agrawal, Kunal
Kulkarni, Milind
[J]. ACM TRANSACTIONS ON PARALLEL COMPUTING, 2019, 6 (04)
[28] Visualization aided performance tuning of irregular task-parallel computations
Blochinger, Wolfgang
Kaufmann, Michael
Siebenhaller, Martin
[J]. Information Visualization, 2006, 5 (02) : 81 - 94
[29] Extending High-Level Synthesis for Task-Parallel Programs
Chi, Yuze
Guo, Licheng
Lau, Jason
Choi, Young-kyu
Wang, Jie
Cong, Jason
[J]. 2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 204 - 213
[30] Task-Parallel LU Factorization of Hierarchical Matrices using OmpSs
Aliaga, Jose I.
Carratala-Saez, Rocio
Quintana-Orti, Enrique S.
Krimann, Ronald
[J]. 2017 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2017, : 1148 - 1157

← 1 2 3 4 5 →