Loop Parallelization Techniques for FPGA Accelerator Synthesis

被引:3
|
作者
Reiche, Oliver [1 ]
Ozkan, M. Akif [1 ]
Hannig, Frank [1 ]
Teich, Juergen [1 ]
Schmid, Moritz [2 ]
机构
[1] Friedrich Alexander Univ, Hardware Software Codesign, Dept Comp Sci, Erlangen Nurnberg FAU, Cauerstr 11, D-91054 Erlangen, Germany
[2] Siemens Healthcare GmbH, Adv Therapies Business Unit, R&D, Forchheim, Germany
关键词
Altera OpenCL; Vivado HLS; Vectorization; Loop coarsening; Loop tiling; FLOW;
D O I
10.1007/s11265-017-1229-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP). The support for Data-Level Parallelism (DLP), one of the key advantages of Field programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines. In addition to well-known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows processing multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We present concrete implementations of tiling and coarsening for Vivado HLS and Altera OpenCL. Furthermore, we present a comparison of our implementations to the keyword-driven parallelization support provided by the Altera Offline Compiler. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework HIPAcc to generate loop coarsening implementations for Vivado HLS and Altera OpenCL. Moreover, we compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPU), all generated from exactly the same code base.
引用
收藏
页码:3 / 27
页数:25
相关论文
共 50 条
  • [41] An Object-Oriented Framework for Loop Parallelization
    Youichi Omori
    Akira Fukuda
    Kazuki Joe
    The Journal of Supercomputing, 1999, 13 : 57 - 69
  • [42] GPU Parallelization of HEVC In-Loop Filters
    Biao Wang
    Diego F. de Souza
    Mauricio Alvarez-Mesa
    Chi Ching Chi
    Ben Juurlink
    Aleksandar Ilic
    Nuno Roma
    Leonel Sousa
    International Journal of Parallel Programming, 2017, 45 : 1515 - 1535
  • [43] Automatic loop parallelization: An abstract interpretation approach
    Ricci, L
    PAR ELEC 2002: INTERNATIONAL CONFERENCE ON PARALLEL COMPUTING IN ELECTRICAL ENGINEERING, 2002, : 112 - 118
  • [44] GPU Parallelization of HEVC In-Loop Filters
    Wang, Biao
    de Souza, Diego F.
    Alvarez-Mesa, Mauricio
    Chi, Chi Ching
    Juurlink, Ben
    Ilic, Aleksandar
    Roma, Nuno
    Sousa, Leonel
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2017, 45 (06) : 1515 - 1535
  • [45] Parallelization Approaches for Hardware Accelerators - Loop Unrolling Versus Loop Partitioning
    Hannig, Frank
    Dutta, Hritam
    Teich, Juergen
    ARCHITECTURE OF COMPUTING SYSTEMS-ARCS 2009, 22ND INTERNATIONAL CONFERENCE, 2009, 5455 : 16 - 27
  • [46] PERFECT PIPELINING - A NEW LOOP PARALLELIZATION TECHNIQUE
    AIKEN, A
    NICOLAU, A
    LECTURE NOTES IN COMPUTER SCIENCE, 1988, 300 : 221 - 235
  • [47] Symbolic analysis techniques for program parallelization
    Fahringer, T
    FUTURE GENERATION COMPUTER SYSTEMS, 1998, 13 (4-5) : 385 - 396
  • [48] Loop Parallelization using Dynamic Commutativity Analysis
    Vasiladiotis, Christos
    Lozano, Roberto Castaneda
    Cole, Murray
    Franke, Bjorn
    CGO '21: PROCEEDINGS OF THE 2021 IEEE/ACM INTERNATIONAL SYMPOSIUM ON CODE GENERATION AND OPTIMIZATION (CGO), 2021, : 150 - 161
  • [49] Algorithmic concept recognition support for automatic parallelization: A case study on loop optimization and parallelization
    Di Martino, B
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 1998, 14 (01) : 191 - 203
  • [50] Automatic Parallelization and Accelerator Offloading for Embedded Applications on Heterogeneous MPSoCs
    Aguilar, Miguel Angel
    Leupers, Rainer
    Ascheid, Gerd
    Murillo, Luis Gabriel
    2016 ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2016,