Loop Parallelization Techniques for FPGA Accelerator Synthesis

被引:3
|
作者
Reiche, Oliver [1 ]
Ozkan, M. Akif [1 ]
Hannig, Frank [1 ]
Teich, Juergen [1 ]
Schmid, Moritz [2 ]
机构
[1] Friedrich Alexander Univ, Hardware Software Codesign, Dept Comp Sci, Erlangen Nurnberg FAU, Cauerstr 11, D-91054 Erlangen, Germany
[2] Siemens Healthcare GmbH, Adv Therapies Business Unit, R&D, Forchheim, Germany
关键词
Altera OpenCL; Vivado HLS; Vectorization; Loop coarsening; Loop tiling; FLOW;
D O I
10.1007/s11265-017-1229-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP). The support for Data-Level Parallelism (DLP), one of the key advantages of Field programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines. In addition to well-known loop tiling techniques, we propose loop coarsening, which delivers superior performance and scalability. Loop tiling corresponds to splitting an image into separate regions, which are then processed in parallel by replicated accelerators. For data streaming, this also requires the generation of glue logic for the distribution of image data. Conversely, loop coarsening allows processing multiple pixels in parallel, whereby only the kernel operator is replicated within a single accelerator. We present concrete implementations of tiling and coarsening for Vivado HLS and Altera OpenCL. Furthermore, we present a comparison of our implementations to the keyword-driven parallelization support provided by the Altera Offline Compiler. We augment the FPGA back end of the heterogeneous Domain-Specific Language (DSL) framework HIPAcc to generate loop coarsening implementations for Vivado HLS and Altera OpenCL. Moreover, we compare the resulting FPGA accelerators to highly optimized software implementations for Graphics Processing Units (GPU), all generated from exactly the same code base.
引用
收藏
页码:3 / 27
页数:25
相关论文
共 50 条
  • [1] Loop Parallelization Techniques for FPGA Accelerator Synthesis
    Oliver Reiche
    M. Akif Özkan
    Frank Hannig
    Jürgen Teich
    Moritz Schmid
    Journal of Signal Processing Systems, 2018, 90 : 3 - 27
  • [2] A Ubiquitous Machine Learning Accelerator With Automatic Parallelization on FPGA
    Wang, Chao
    Gong, Lei
    Li, Xi
    Zhou, Xuehai
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (10) : 2346 - 2359
  • [3] Logical Inference Techniques for Loop Parallelization
    Oancea, Cosmin E.
    Rauchwerger, Lawrence
    ACM SIGPLAN NOTICES, 2012, 47 (06) : 509 - 520
  • [4] Loop Parallelization And Pipelining Implementation Of AES Algorithm Using OpenMP And FPGA
    Banu, J. Saira
    Vanitha, M.
    Vaideeswaran, J.
    Subha, S.
    2013 IEEE INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN COMPUTING, COMMUNICATION AND NANOTECHNOLOGY (ICE-CCN'13), 2013, : 481 - 485
  • [5] Loop Nests Parallelization for Digital System Synthesis
    Chemeris, Alexander
    Gorunova, Julia
    Lazorenko, Dmiry
    PROCEEDINGS OF IEEE EAST-WEST DESIGN & TEST SYMPOSIUM (EWDTS 2013), 2013,
  • [6] Plugging anti and output dependence removal techniques into loop parallelization algorithm
    Calland, PY
    Darte, A
    Robert, Y
    Vivien, F
    PARALLEL COMPUTING, 1997, 23 (1-2) : 251 - 266
  • [7] Using knowledge-based techniques on loop parallelization for parallelizing compilers
    Yang, CT
    Tseng, SS
    Chuang, CD
    Shih, WC
    PARALLEL COMPUTING, 1997, 23 (03) : 291 - 309
  • [8] OPTIMAL LOOP PARALLELIZATION
    AIKEN, A
    NICOLAU, A
    SIGPLAN NOTICES, 1988, 23 (07): : 308 - 317
  • [9] Automatic loop parallelization
    Kumar, M
    Patnaik, LM
    COMPUTER JOURNAL, 1997, 40 (06): : 301 - 301
  • [10] A Parallelization Cost Model for FPGA
    Zhang, Dan
    Zhao, Rongcai
    Han, Lin
    Qu, Jin
    ADVANCED MATERIALS SCIENCE AND TECHNOLOGY, PTS 1-2, 2011, 181-182 : 623 - 628