Multi-FPGA Accelerator for Scalable Stencil Computation with Constant Memory Bandwidth

被引:75
|
作者
Sano, Kentaro [1 ]
Hatsuda, Yoshiaki [2 ]
Yamamoto, Satoru [1 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 980, Japan
[2] Kobo Co Ltd, Kawaguchi, Saitama, Japan
关键词
Scalable streaming-array; stencil computation; custom computing machine; FPGA; high-performance computation; MODEL;
D O I
10.1109/TPDS.2013.51
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Stencil computation is one of the important kernels in scientific computations. However, sustained performance is limited owing to restriction on memory bandwidth, especially on multicore microprocessors and graphics processing units (GPUs) because of their small operational intensity. In this paper, we present a custom computing machine (CCM), called a scalable streaming-array (SSA), for high-performance stencil computations with multiple field-programmable gate arrays (FPGAs). We design SSA based on a domain-specific programmable concept, where CCMs are programmable with the minimum functionality required for an algorithm domain. We employ a deep pipelining approach over successive iterations to achieve linear scalability for multiple devices with a constant memory bandwidth. Prototype implementation using nine FPGAs demonstrates good agreement with a performance model, and achieves 260 and 236 GFlop/s for 2D and 3D Jacobi computation, which are 87.4 and 83.9 percent of the peak, respectively, with a memory bandwidth of only 2.0 GB/s. We also evaluate the performance of SSA for state-of-the-art FPGAs.
引用
收藏
页码:695 / 705
页数:11
相关论文
共 38 条
  • [21] Scalable Multi-FPGA Design of a Discontinuous Galerkin Shallow-Water Model on Unstructured Meshes
    Faj, Jennifer
    Kenter, Tobias
    Faghih-Naini, Sara
    Plessl, Christian
    Aizinger, Vadym
    PROCEEDINGS OF THE PLATFORM FOR ADVANCED SCIENTIFIC COMPUTING CONFERENCE, PASC 2023, 2023,
  • [22] Analysis of a Dynamically Reconfigurable Dataflow Architecture and its Scalable Parallel Extension for Multi-FPGA Platforms
    Voigt, Sven-Ole
    Teufel, Thomas
    PROCEEDINGS OF THE SIXTEENTH IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, 2008, : 261 - 262
  • [23] A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication
    Fowers, Jeremy
    Ovtcharov, Kalin
    Strauss, Karin
    Chung, Eric S.
    Stitt, Greg
    2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 36 - 43
  • [24] M5: Multi-modal Multi-task Model Mapping on Multi-FPGA with Accelerator Configuration Search
    Kamath, Akshay Karkal
    Abi-Karam, Stefan
    Bhat, Ashwin
    Hao, Cong
    2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
  • [25] MuDBN: An Energy-Efficient and High-Performance Multi-FPGA Accelerator for Deep Belief Networks
    Cheng, Yuming
    Wang, Chao
    Zhao, Yangyang
    Chen, Xianglan
    Zhou, Xuehai
    Li, Xi
    PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 435 - 438
  • [26] A Memory-Bandwidth-Efficient Word2vec Accelerator Using OpenCL for FPGA
    Shoji, Tomoki
    Waidyasooriya, Hasitha Muthumala
    Ono, Taisuke
    Hariyama, Masanori
    Aoki, Yuichiro
    Kondoh, Yuki
    Nakagawa, Yaoko
    2019 SEVENTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING WORKSHOPS (CANDARW 2019), 2019, : 103 - 108
  • [27] A Scalable Distributed Radix Sorter for FPGA Clusters using High-Bandwidth Memory Networks
    Urino, Yutaka
    Shimizu, Takanori
    Yamaguchi, Hiroshi
    Mizutani, Kenji
    Nakamura, Shigeru
    Usuki, Tatsuya
    Koibuchi, Michihiro
    2022 IEEE 30TH INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2022), 2022, : 230 - 230
  • [28] Direct Device-to-Device Physical Page Migrations in Multi-FPGA Shared Virtual Memory Systems
    Kalkhof, Torben
    Koch, Andreas
    2022 32ND INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL, 2022, : 225 - 234
  • [29] Bounding Memory Access Times in Multi-Accelerator Architectures on FPGA SoCs
    Restuccia, Francesco
    Pagani, Marco
    Biondi, Alessandro
    Marinoni, Mauro
    Buttazzo, Giorgio
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (01) : 154 - 167
  • [30] Integration of a Highly Scalable, Multi-FPGA-Based Hardware Accelerator in Common Cluster Infrastructures
    Knodel, Oliver
    Georgi, Andy
    Lehmann, Patrick
    Nagel, Wolfgang E.
    Spallek, Rainer G.
    2013 42ND ANNUAL INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2013, : 893 - 900