FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method

被引:17
|
作者
Elkurdi, Yousef [1 ]
Fernandez, David [1 ]
Souleimanov, Evgueni [1 ]
Giannacopoulos, Dennis [1 ]
Gross, Warren J. [1 ]
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ H3A 2A7, Canada
关键词
FPGA; SMVM; FEM;
D O I
10.1016/j.cpc.2007.11.014
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The Finite Element Method (FEM) is a computationally intensive scientific and engineering analysis tool that has diverse applications ranging from structural engineering to electromagnetic simulation. The trends in floating-point performance are moving in favor of Field-Progmmmable Gate Arrays (FPGAs), hence increasing interest has grown in the scientific community to exploit this technology. We present an architecture and implementation of an FPGA-based sparse matrix-vector multiplier (SMVM) for use in the iterative solution of large, sparse systems of equations arising from FEM applications. FEM matrices display specific sparsity patterns that can be exploited to improve the efficiency of hardware designs. Our architecture exploits FEM matrix sparsity structure to achieve a balance between performance and hardware resource requirements by relying on external SDRAM for data storage while utilizing the FPGAs computational resources in a stream-through systolic approach. The architecture is based on a pipelined linear array of processing elements (PEs) coupled with a hardware-oriented matrix striping algorithm and a partitioning scheme which enables it to process arbitrarily big matrices without changing the number of PEs in the architecture. Therefore, this architecture is only limited by the amount of external RAM available to the FPGA. The implemented SMVM-pipeline prototype contains 8 PEs and is clocked at 110 MHz obtaining a peak performance of 1.76 GFLOPS. For 8 GB/s of memory bandwidth typical of recent FPGA systems, this architecture can achieve 1.5 GFLOPS sustained performance. Using multiple instances of the pipeline, linear scaling of the peak and sustained performance can be achieved. Our stream-through architecture provides the added advantage of enabling an iterative implementation of the SMVM computation required by iterative solution techniques such as the conjugate gradient method, avoiding initialization time due to data loading and setup inside the FPGA internal memory. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:558 / 570
页数:13
相关论文
共 50 条
  • [1] Sparse matrix-vector multiplication for Finite Element Method matrices on FPGAs
    El-Kurdi, Yousef
    Gross, Warren J.
    Giannacopoulos, Dennis
    FCCM 2006: 14TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2006, : 293 - +
  • [2] High performance sparse matrix-vector multiplication on FPGA
    Zou, Dan
    Dou, Yong
    Guo, Song
    Ni, Shice
    IEICE ELECTRONICS EXPRESS, 2013, 10 (17):
  • [3] Towards a Universal FPGA Matrix-Vector Multiplication Architecture
    Kestur, Srinidhi
    Davis, John D.
    Chung, Eric S.
    2012 IEEE 20TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2012, : 9 - 16
  • [4] On sparse matrix-vector multiplication with FPGA-based system
    ElGindy, H
    Shue, YL
    10TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2002, : 273 - 274
  • [5] Fast Sparse Matrix-Vector Multiplication on Graphics Processing Unit for Finite Element Analysis
    Ahamed, Abal-Kassim Cheik
    Magoules, Frederic
    2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1307 - 1314
  • [6] Efficient Sparse Matrix-Vector Multiplication on Intel PIUMA Architecture
    Aananthakrishnan, Sriram
    Pawlowski, Robert
    Fryman, Joshua
    Hur, Ibrahim
    2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [7] A New Method of Sparse Matrix-Vector Multiplication on GPU
    Huan, Gao
    Qian, Zhang
    PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 954 - 958
  • [8] A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication
    Fowers, Jeremy
    Ovtcharov, Kalin
    Strauss, Karin
    Chung, Eric S.
    Stitt, Greg
    2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 36 - 43
  • [9] Sparse Matrix-Vector Multiplication on GPGPUs
    Filippone, Salvatore
    Cardellini, Valeria
    Barbieri, Davide
    Fanfarillo, Alessandro
    ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2017, 43 (04):
  • [10] CUDA GPU libraries and novel sparse matrix-vector multiplication - Implementation and performance enhancement in unstructured finite element computations
    Haney R.
    Mohan R.
    International Journal of Computational Science and Engineering, 2019, 20 (04): : 501 - 507