TaskStream: Accelerating Task-Parallel Workloads by Recovering Program Structure

被引:10
|
作者
Dadu, Vidushi [1 ]
Nowatzki, Tony [1 ]
机构
[1] Univ Calif Los Angeles, Los Angeles, CA 90095 USA
关键词
Irregularity; tasks; load-balance; accelerators; generality; dataflow; reconfigurable; streaming; LOCALITY;
D O I
10.1145/3503222.3507706
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Reconfigurable accelerators, like CGRAs and dataflow architectures, have come to prominence for addressing data-processing problems. However, they are largely limited to workloads with regular parallelism, precluding their applicability to prevalent task-parallel workloads. Reconfigurable architectures and task parallelism seem to be at odds, as the former requires repetitive and simple program structure, and the latter breaks program structure to create small, individually scheduled program units. Our insight is that if tasks and their potential for communication structure are first-class primitives in the hardware, it is possible to recover program structure with extremely low overhead. We propose a task execution model for accelerators called TaskStream, which annotates task dependences with information sufficient to recover inter-task structure. TaskStream enables work-aware load balancing, recovery of pipelined inter-task dependences, and recovery of inter-task read sharing through multicasting. We apply TaskStream to a reconfigurable dataflow architecture, creating a seamless hierarchical dataflow model for task-parallel workloads. We compare our accelerator, Delta, with an equivalent static-parallel design. Overall, we find that our execution model can improve performance by 2.2x with only 3.6% area overhead, while alleviating the programming burden of managing task distribution.
引用
收藏
页码:1 / 13
页数:13
相关论文
共 50 条
  • [1] Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL
    Aji, Ashwin M.
    Pena, Antonio J.
    Balaji, Pavan
    Feng, Wu-chun
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 42 - 51
  • [2] MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL
    Aji, Ashwin M.
    Pena, Antonio J.
    Balaji, Pavan
    Feng, Wu-chun
    [J]. PARALLEL COMPUTING, 2016, 58 : 37 - 55
  • [3] Performance modelling for task-parallel programs
    Kühnemann, M
    Rauber, T
    Rünger, G
    [J]. PERFORMANCE ANALYSIS AND GRID COMPUTING, 2004, : 77 - 91
  • [4] Task-Parallel Reductions in OpenMP and OmpSs
    Ciesko, Jan
    Mateo, Sergi
    Teruel, Xavier
    Beltran, Vicenc
    Martorell, Xavier
    Badia, Rosa M.
    Ayguade, Eduard
    Labarta, Jesus
    [J]. USING AND IMPROVING OPENMP FOR DEVICES, TASKS, AND MORE, 2014, 8766 : 1 - 15
  • [5] Task-parallel reductions in openMP and OmpSs
    [J]. 1600, Springer Verlag (8766):
  • [6] Task-parallel reductions in OpenMP and OmpSs
    [J]. 1600, Springer Verlag (8766):
  • [7] Towards Task-Parallel Reductions in OpenMP
    Ciesko, Jan
    Mateo, Sergi
    Teruel, Xavier
    Martorell, Xavier
    Ayguade, Eduard
    Labarta, Jesus
    Duran, Alex
    de Supinski, Bronis R.
    Olivier, Stephen
    Li, Kelvin
    Eichenberger, Alexandre E.
    [J]. OPENMP: HETEROGENOUS EXECUTION AND DATA MOVEMENTS, IWOMP 2015, 2015, 9342 : 189 - 201
  • [8] Task-Parallel Programming on NUMA Architectures
    Terboven, Christian
    Schmidl, Dirk
    Cramer, Tim
    Mey, Dieter An
    [J]. EURO-PAR 2012 PARALLEL PROCESSING, 2012, 7484 : 638 - 649
  • [9] Task-Parallel Programming with Constrained Parallelism
    Huang, Tsung-Wei
    Hwang, Leslie
    [J]. 2022 IEEE HIGH PERFORMANCE EXTREME COMPUTING VIRTUAL CONFERENCE (HPEC), 2022,
  • [10] A GPU Task-Parallel Model with Dependency Resolution
    Tzeng, Stanley
    Lloyd, Brandon
    Owens, John D.
    [J]. COMPUTER, 2012, 45 (08) : 34 - 41