FPGA-Based Scalable and Power-Efficient Fluid Simulation using Floating-Point DSP Blocks

被引:17
|
作者
Sano, Kentaro [1 ]
Yamamoto, Satoru [1 ]
机构
[1] Tohoku Univ, Grad Sch Informat Sci, Sendai, Miyagi 9808577, Japan
关键词
FPGA; fluid simulation; custom computing machine; stream computing; floating-point; high-performance computing; LATTICE BOLTZMANN METHOD; IMPLEMENTATION;
D O I
10.1109/TPDS.2017.2691770
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
High-performance and low-power computation is required for large-scale fluid dynamics simulation. Due to the inefficient architecture and structure of CPUs and GPUs, they now have a difficulty in improving power efficiency for the target application. Although FPGAs become promising alternatives for power-efficient and high-performance computation due to their new architecture having floating-point (FP) DSP blocks, their relatively narrow memory bandwidth requires an appropriate way to fully exploit the advantage. This paper presents an architecture and design for scalable fluid simulation based on data-flow computing with a state-of-the-art FPGA. To exploit available hardware resources including FP DSPs, we introduce spatial and temporal parallelism to further scale the performance by adding more stream processing elements (SPEs) in an array. Performance modeling and prototype implementation allow us to explore the design space for both the existing Altera Arria10 and the upcoming Intel Stratix10 FPGAs. We demonstrate that Arria10 10AX115 FPGA achieves 519 GFlops at 9.67 GFlops/Wonly with a stream bandwidth of 9.0 GB/s, which is 97.9 percent of the peak performance of 18 implemented SPEs. We also estimate that Stratix10 FPGA can scale up to 6844 GFlops by combining spatial and temporal parallelism adequately.
引用
收藏
页码:2823 / 2837
页数:15
相关论文
共 50 条
  • [21] High-Level Languages and Floating-Point Arithmetic for FPGA-Based CFD Simulations
    Sanchez-Roman, Diego
    Sutter, Gustavo
    Lopez-Buedo, Sergio
    Gonzalez, Ivan
    Gomez-Arribas, Francisco J.
    Aracil, Javier
    Palacios, Francisco
    [J]. IEEE DESIGN & TEST OF COMPUTERS, 2011, 28 (04): : 28 - 36
  • [22] FPGA-based Lossless Compressors of Floating-Point Data Streams to Enhance Memory Bandwidth
    Katahira, Kazuya
    Sano, Kentaro
    Yamamoto, Satoru
    [J]. 21ST IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, 2010,
  • [23] LDPC decoder with a limited-precision FPGA-based floating-point multiplication coprocessor
    Moberly, Raymond
    O'Sullivana, Michael
    Waheed, Khurram
    [J]. ADVANCED SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, AND IMPLEMENTATIONS XVII, 2007, 6697
  • [24] FPGA-Based Training of Convolutional Neural Networks With a Reduced Precision Floating-Point Library
    DiCecco, Roberto
    Sun, Lin
    Chow, Paul
    [J]. 2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY (ICFPT), 2017, : 239 - 242
  • [25] An FPGA-based low-cost VLIW floating-point processor for CNC applications
    Dong, Jingchuan
    Wang, Taiyong
    Li, Bo
    Liu, Zhe
    Yu, Zhigiang
    [J]. MICROPROCESSORS AND MICROSYSTEMS, 2017, 50 : 14 - 25
  • [26] Logarithm-approximate floating-point multiplier is applicable to power-efficient neural network training
    Cheng, TaiYu
    Masuda, Yukata
    Chen, Jun
    Yu, Jaehoon
    Hashimoto, Masanori
    [J]. INTEGRATION-THE VLSI JOURNAL, 2020, 74 : 19 - 31
  • [27] An FPGA-based floating-point processor array supporting a high-precision dot product
    Mayer-Lindenberg, Fritz
    Beller, Valerij
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, PROCEEDINGS, 2006, : 317 - +
  • [28] A Power-Efficient FPGA-Based Self-Adaptive Software Defined Radio
    Dobson, Chris
    Rooks, Kurt
    Athanas, Peter
    [J]. 2014 24TH INTERNATIONAL WORKSHOP ON POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION (PATMOS), 2014,
  • [29] An FPGA-based application-specific processor for efficient reduction of multiple variable-length floating-point data sets
    Morris, Gerald R.
    Prasanna, Viktor K.
    Anderson, Richard D.
    [J]. IEEE 17TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES AND PROCESSORS, PROCEEDINGS, 2006, : 323 - +
  • [30] Improving Power of DSP and CNN Hardware Accelerators Using Approximate Floating-point Multipliers
    Leon, Vasileios
    Paparouni, Theodora
    Petrongonas, Evangelos
    Soudris, Dimitrios
    Pekmestzi, Kiamal
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2021, 20 (05)