Hexagonal Tiling based Multiple FPGAs Stencil Computation Acceleration and Optimization Methodology

被引：0

作者：

Wang, Jinyu ^{[1
]}

Kang, Yifei ^{[1
]}

Li, Yiwen ^{[1
]}

Wu, Weiguo ^{[1
]}

Liu, Song ^{[1
]}

Wang, Longxiang ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian, Peoples R China

来源：

19TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2021) | 2021年

关键词：

hexagonal tiling; stencil computation; Field Programmable Gate Array; multiple FPGAs; acceleration; SYSTEMS;

D O I：

10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00101

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, multiple Field Programmable Gate Arrays (FPGAs) accelerators have been widely used in stencil computation fields. However, the state-of-the-art hexagonal tiling algorithm that efficiently improves stencil computation performance is mainly designed for CPUs or GPUs, which not suitable to directly process on FPGAs, leading to lower performance. To address this, a hexagonal tiling based multiple FPGAs stencil computation architecture and the corresponding optimization algorithm are proposed in this paper. The architecture uses the on-chip registers to store and carry cells data of a hexagonal tile. In this way, the scale and size of tiles are dramatically increased as well as the intra-FPGA calculation performance. Then, to take full advantage of multiple FPGAs processing ability, a memory shared inter-FPGAs high speed data transfer structure is devised. Finally, the Mixed-Integer Linear Programming (MILP) is used to optimize an objective function which considers the candidate FPGAs costs, computation latency and resources utilization to obtain a desirable tile size and layout result. The proposed method has been validated on the FPGA cluster which consists of two Xilinx Alveo U50 and one Alveo U250 devices. And experimental results show that we achieve performance up to 580 Gflop/s using one U50 device and 2261 Gflop/s using three FPGAs. The proposed optimizer is also tested with state-of-the-art multiple FPGAs stencil computation acceleration method and the performance is increased by 21.8% at most and 20.04% on average.

引用

页码：697 / 705

页数：9

共 38 条

[1] Hexagonal Loop Tiling for Jacobi Computation Optimization Method
Qu, Bin
Liu, Song
Zhang, Zeng-Yuan
Ma, Jie
Wu, Wei-Guo
Ruan Jian Xue Bao/Journal of Software, 2024, 35 (08): : 3721 - 3738
[2] OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology
Waidyasooriya, Hasitha Muthumala
Takei, Yasuhiro
Tatsumi, Shunsuke
Hariyama, Masanori
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2017, 28 (05) : 1390 - 1402
[3] HLS-Based FPGA Acceleration of Building-Cube Stencil Computation
Soejima, Rie
Shibata, Yuichiro
Oguri, Kiyoshi
COMPLEX, INTELLIGENT, AND SOFTWARE INTENSIVE SYSTEMS, CISIS-2017, 2018, 611 : 463 - 474
[4] An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers
Cong, Jason
Li, Peng
Xiao, Bingjun
Zhang, Peng
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2016, 35 (03) : 407 - 418
[5] A partition based methodology for simulation acceleration of digital VLSI circuits using FPGAs
Hashmi, A
Biswas, S
Pal, DR
Mukhopadhyay, S
Proceedings of the IEEE INDICON 2004, 2004, : 31 - 34
[6] SASA: A Scalable and Automatic Stencil Acceleration Framework for Optimized Hybrid Spatial and Temporal Parallelism on HBM-based FPGAs
Tian, Xingyu
Ye, Zhifan
Lu, Alec
Guo, Licheng
Chi, Yuze
Fang, Zhenman
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2023, 16 (02)
[7] An Optimal Microarchitecture for Stencil Computation Acceleration Based on Non-Uniform Partitioning of Data Reuse Buffers
Cong, Jason
Li, Peng
Xiao, Bingjun
Zhang, Peng
2014 51ST ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2014,
[8] Performance modeling and optimization of 3-D stencil computation on a stream-based FPGA accelerator
Dohi, Keisuke
Fukumoto, Kota
Shibata, Yuichiro
Oguri, Kiyoshi
2013 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2013,
[9] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Shen, Jingcheng
Long, Linbo
Deng, Xin
Okita, Masao
Ino, Fumihiko
JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
[10] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Jingcheng Shen
Linbo Long
Xin Deng
Masao Okita
Fumihiko Ino
The Journal of Supercomputing, 2023, 79 : 11055 - 11077

← 1 2 3 4 →