Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth

被引：75

作者：

Sano, Kentaro ^{[1
]}

Hatsuda, Yoshiaki ^{[2
]}

Yamamoto, Satoru ^{[1
]}

机构：

[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 980, Japan

[2] Kobo Co Ltd, Kawaguchi, Saitama, Japan

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2014年 / 25卷 / 03期

关键词：

Scalable streaming-array; stencil computation; custom computing machine; FPGA; high-performance computation; MODEL;

D O I：

10.1109/TPDS.2013.51

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Stencil computation is one of the important kernels in scientific computations. However, sustained performance is limited owing to restriction on memory bandwidth, especially on multicore microprocessors and graphics processing units (GPUs) because of their small operational intensity. In this paper, we present a custom computing machine (CCM), called a scalable streaming-array (SSA), for high-performance stencil computations with multiple field-programmable gate arrays (FPGAs). We design SSA based on a domain-specific programmable concept, where CCMs are programmable with the minimum functionality required for an algorithm domain. We employ a deep pipelining approach over successive iterations to achieve linear scalability for multiple devices with a constant memory bandwidth. Prototype implementation using nine FPGAs demonstrates good agreement with a performance model, and achieves 260 and 236 GFlop/s for 2D and 3D Jacobi computation, which are 87.4 and 83.9 percent of the peak, respectively, with a memory bandwidth of only 2.0 GB/s. We also evaluate the performance of SSA for state-of-the-art FPGAs.

引用

页码：695 / 705

页数：11

共 38 条

[21] Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured Meshes
Faj, Jennifer
Kenter, Tobias
Faghih-Naini, Sara
Plessl, Christian
Aizinger, Vadym
PROCEEDINGS OF THE PLATFORM FOR ADVANCED SCIENTIFIC COMPUTING CONFERENCE, PASC 2023, 2023,
[22] Analysis of a Dynamically Reconfigurable Dataflow Architecture and its Scalable Parallel Extension for Multi-FPGA Platforms
Voigt, Sven-Ole
Teufel, Thomas
PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2008, : 261 - 262
[23] A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication
Fowers, Jeremy
Ovtcharov, Kalin
Strauss, Karin
Chung, Eric S.
Stitt, Greg
2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 36 - 43
[24] M5: Multi-modal Multi-task Model Mapping on Multi-FPGA with Accelerator Configuration Search
Kamath, Akshay Karkal
Abi-Karam, Stefan
Bhat, Ashwin
Hao, Cong
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[25] MuDBN: An Energy-Efficient and High-Performance Multi-FPGA Accelerator for Deep Belief Networks
Cheng, Yuming
Wang, Chao
Zhao, Yangyang
Chen, Xianglan
Zhou, Xuehai
Li, Xi
PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 435 - 438
[26] A Memory-Bandwidth-Efficient Word2vec Accelerator Using OpenCL for FPGA
Shoji, Tomoki
Waidyasooriya, Hasitha Muthumala
Ono, Taisuke
Hariyama, Masanori
Aoki, Yuichiro
Kondoh, Yuki
Nakagawa, Yaoko
2019 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2019), 2019, : 103 - 108
[27] A Scalable Distributed Radix Sorter for FPGA Clusters using High-Bandwidth Memory Networks
Urino, Yutaka
Shimizu, Takanori
Yamaguchi, Hiroshi
Mizutani, Kenji
Nakamura, Shigeru
Usuki, Tatsuya
Koibuchi, Michihiro
2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 230 - 230
[28] Direct Device-to-Device Physical Page Migrations in Multi-FPGA Shared Virtual Memory Systems
Kalkhof, Torben
Koch, Andreas
2022 32ND INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2022, : 225 - 234
[29] Bounding Memory Access Times in Multi-Accelerator Architectures on FPGA SoCs
Restuccia, Francesco
Pagani, Marco
Biondi, Alessandro
Marinoni, Mauro
Buttazzo, Giorgio
IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (01) : 154 - 167
[30] Integration of a Highly Scalable, Multi-FPGA-Based Hardware Accelerator in Common Cluster Infrastructures
Knodel, Oliver
Georgi, Andy
Lehmann, Patrick
Nagel, Wolfgang E.
Spallek, Rainer G.
2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 893 - 900

← 1 2 3 4 →