Hexagonal Tiling based Multiple FPGAs Stencil Computation Acceleration and Optimization Methodology

被引:0
|
作者
Wang, Jinyu [1 ]
Kang, Yifei [1 ]
Li, Yiwen [1 ]
Wu, Weiguo [1 ]
Liu, Song [1 ]
Wang, Longxiang [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian, Peoples R China
关键词
hexagonal tiling; stencil computation; Field Programmable Gate Array; multiple FPGAs; acceleration; SYSTEMS;
D O I
10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00101
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, multiple Field Programmable Gate Arrays (FPGAs) accelerators have been widely used in stencil computation fields. However, the state-of-the-art hexagonal tiling algorithm that efficiently improves stencil computation performance is mainly designed for CPUs or GPUs, which not suitable to directly process on FPGAs, leading to lower performance. To address this, a hexagonal tiling based multiple FPGAs stencil computation architecture and the corresponding optimization algorithm are proposed in this paper. The architecture uses the on-chip registers to store and carry cells data of a hexagonal tile. In this way, the scale and size of tiles are dramatically increased as well as the intra-FPGA calculation performance. Then, to take full advantage of multiple FPGAs processing ability, a memory shared inter-FPGAs high speed data transfer structure is devised. Finally, the Mixed-Integer Linear Programming (MILP) is used to optimize an objective function which considers the candidate FPGAs costs, computation latency and resources utilization to obtain a desirable tile size and layout result. The proposed method has been validated on the FPGA cluster which consists of two Xilinx Alveo U50 and one Alveo U250 devices. And experimental results show that we achieve performance up to 580 Gflop/s using one U50 device and 2261 Gflop/s using three FPGAs. The proposed optimizer is also tested with state-of-the-art multiple FPGAs stencil computation acceleration method and the performance is increased by 21.8% at most and 20.04% on average.
引用
收藏
页码:697 / 705
页数:9
相关论文
共 38 条
  • [1] Hexagonal Loop Tiling for Jacobi Computation Optimization Method
    Qu, Bin
    Liu, Song
    Zhang, Zeng-Yuan
    Ma, Jie
    Wu, Wei-Guo
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (08): : 3721 - 3738
  • [2] OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology
    Waidyasooriya, Hasitha Muthumala
    Takei, Yasuhiro
    Tatsumi, Shunsuke
    Hariyama, Masanori
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1390 - 1402
  • [3] HLS-Based FPGA Acceleration of Building-Cube Stencil Computation
    Soejima, Rie
    Shibata, Yuichiro
    Oguri, Kiyoshi
    COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS, CISIS-2017, 2018, 611 : 463 - 474
  • [4] An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers
    Cong, Jason
    Li, Peng
    Xiao, Bingjun
    Zhang, Peng
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2016, 35 (03) : 407 - 418
  • [5] A partition based methodology for simulation acceleration of digital VLSI circuits using FPGAs
    Hashmi, A
    Biswas, S
    Pal, DR
    Mukhopadhyay, S
    Proceedings of the IEEE INDICON 2004, 2004, : 31 - 34
  • [6] SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs
    Tian, Xingyu
    Ye, Zhifan
    Lu, Alec
    Guo, Licheng
    Chi, Yuze
    Fang, Zhenman
    ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2023, 16 (02)
  • [7] An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers
    Cong, Jason
    Li, Peng
    Xiao, Bingjun
    Zhang, Peng
    2014 51ST ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2014,
  • [8] Performance modeling and optimization of 3-D stencil computation on a stream-based FPGA accelerator
    Dohi, Keisuke
    Fukumoto, Kota
    Shibata, Yuichiro
    Oguri, Kiyoshi
    2013 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2013,
  • [9] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Shen, Jingcheng
    Long, Linbo
    Deng, Xin
    Okita, Masao
    Ino, Fumihiko
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
  • [10] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Jingcheng Shen
    Linbo Long
    Xin Deng
    Masao Okita
    Fumihiko Ino
    The Journal of Supercomputing, 2023, 79 : 11055 - 11077