FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks

被引：17

作者：

Sano, Kentaro ^{[1
]}

Yamamoto, Satoru ^{[1
]}

机构：

[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 9808577, Japan

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2017年 / 28卷 / 10期

关键词：

FPGA; fluid simulation; custom computing machine; stream computing; floating-point; high-performance computing; LATTICE BOLTZMANN METHOD; IMPLEMENTATION;

D O I：

10.1109/TPDS.2017.2691770

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

High-performance and low-power computation is required for large-scale fluid dynamics simulation. Due to the inefficient architecture and structure of CPUs and GPUs, they now have a difficulty in improving power efficiency for the target application. Although FPGAs become promising alternatives for power-efficient and high-performance computation due to their new architecture having floating-point (FP) DSP blocks, their relatively narrow memory bandwidth requires an appropriate way to fully exploit the advantage. This paper presents an architecture and design for scalable fluid simulation based on data-flow computing with a state-of-the-art FPGA. To exploit available hardware resources including FP DSPs, we introduce spatial and temporal parallelism to further scale the performance by adding more stream processing elements (SPEs) in an array. Performance modeling and prototype implementation allow us to explore the design space for both the existing Altera Arria10 and the upcoming Intel Stratix10 FPGAs. We demonstrate that Arria10 10AX115 FPGA achieves 519 GFlops at 9.67 GFlops/Wonly with a stream bandwidth of 9.0 GB/s, which is 97.9 percent of the peak performance of 18 implemented SPEs. We also estimate that Stratix10 FPGA can scale up to 6844 GFlops by combining spatial and temporal parallelism adequately.

引用

页码：2823 / 2837

页数：15

共 50 条

[21] High-Level Languages and Floating-Point Arithmetic for FPGA-Based CFD Simulations
Sanchez-Roman, Diego
Sutter, Gustavo
Lopez-Buedo, Sergio
Gonzalez, Ivan
Gomez-Arribas, Francisco J.
Aracil, Javier
Palacios, Francisco
[J]. IEEE DESIGN & TEST OF COMPUTERS, 2011, 28 (04): : 28 - 36
[22] FPGA-based Lossless Compressors of Floating-Point Data Streams to Enhance Memory Bandwidth
Katahira, Kazuya
Sano, Kentaro
Yamamoto, Satoru
[J]. 21ST IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2010,
[23] LDPC decoder with a limited-precision FPGA-based floating-point multiplication coprocessor
Moberly, Raymond
O'Sullivana, Michael
Waheed, Khurram
[J]. ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS XVII, 2007, 6697
[24] FPGA-Based Training of Convolutional Neural Networks With a Reduced Precision Floating-Point Library
DiCecco, Roberto
Sun, Lin
Chow, Paul
[J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 239 - 242
[25] An FPGA-based low-cost VLIW floating-point processor for CNC applications
Dong, Jingchuan
Wang, Taiyong
Li, Bo
Liu, Zhe
Yu, Zhigiang
[J]. MICROPROCESSORS AND MICROSYSTEMS, 2017, 50 : 14 - 25
[26] Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training
Cheng, TaiYu
Masuda, Yukata
Chen, Jun
Yu, Jaehoon
Hashimoto, Masanori
[J]. INTEGRATION-THE VLSI JOURNAL, 2020, 74 : 19 - 31
[27] An FPGA-based floating-point processor array supporting a high-precision dot product
Mayer-Lindenberg, Fritz
Beller, Valerij
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2006, : 317 - +
[28] A Power-Efficient FPGA-Based Self-Adaptive Software Defined Radio
Dobson, Chris
Rooks, Kurt
Athanas, Peter
[J]. 2014 24TH INTERNATIONAL WORKSHOP ON POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION (PATMOS), 2014,
[29] An FPGA-based application-specific processor for efficient reduction of multiple variable-length floating-point data sets
Morris, Gerald R.
Prasanna, Viktor K.
Anderson, Richard D.
[J]. IEEE 17TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, PROCEEDINGS, 2006, : 323 - +
[30] Improving Power of DSP and CNN Hardware Accelerators Using Approximate Floating-point Multipliers
Leon, Vasileios
Paparouni, Theodora
Petrongonas, Evangelos
Soudris, Dimitrios
Pekmestzi, Kiamal
[J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)

← 1 2 3 4 5 →