Towards a Universal FPGA Matrix-Vector Multiplication Architecture

被引:44
|
作者
Kestur, Srinidhi [1 ]
Davis, John D. [2 ]
Chung, Eric S. [2 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[2] Microsoft Res Silicon Valley, Mountain View, CA 94043 USA
关键词
FPGA; dense matrix; sparse matrix; spMV; reconfigurable computing;
D O I
10.1109/FCCM.2012.12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present the design and implementation of a universal, single-bitstream library for accelerating matrix-vector multiplication using FPGAs. Our library handles multiple matrix encodings ranging from dense to multiple sparse formats. A key novelty in our approach is the introduction of a hardware-optimized sparse matrix representation called Compressed Variable-Length Bit Vector (CVBV), which reduces the storage and bandwidth requirements up to 43% (on average 25%) compared to compressed sparse row (CSR) across all the matrices from the University of Florida Sparse Matrix Collection. Our hardware incorporates a runtime-programmable decoder that performs on-the-fly decoding of various formats such as Dense, COO, CSR, DIA, and ELL. The flexibility and scalability of our design is demonstrated across two FPGA platforms: (1) the BEE3 (Virtex-5 LX155T with 16GB of DRAM) and (2) ML605 (Virtex-6 LX240T with 2GB of DRAM). For dense matrices, our approach scales to large data sets with over 1 billion elements, and achieves robust performance independent of the matrix aspect ratio. For sparse matrices, our approach using a compressed representation reduces the overall bandwidth while also achieving comparable efficiency relative to state-of-the-art approaches.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 50 条
  • [1] FPGA architecture and implementation of sparse matrix-vector multiplication for the finite element method
    Elkurdi, Yousef
    Fernandez, David
    Souleimanov, Evgueni
    Giannacopoulos, Dennis
    Gross, Warren J.
    COMPUTER PHYSICS COMMUNICATIONS, 2008, 178 (08) : 558 - 570
  • [2] DENSE MATRIX-VECTOR MULTIPLICATION ON THE CUDA ARCHITECTURE
    Fujimoto, Noriyuki
    PARALLEL PROCESSING LETTERS, 2008, 18 (04) : 511 - 530
  • [3] High performance sparse matrix-vector multiplication on FPGA
    Zou, Dan
    Dou, Yong
    Guo, Song
    Ni, Shice
    IEICE ELECTRONICS EXPRESS, 2013, 10 (17):
  • [4] On sparse matrix-vector multiplication with FPGA-based system
    ElGindy, H
    Shue, YL
    10TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2002, : 273 - 274
  • [5] Charge-mode parallel architecture for matrix-vector multiplication
    Genov, R
    Cauwenberghs, G
    PROCEEDINGS OF THE 43RD IEEE MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I-III, 2000, : 506 - 509
  • [6] Efficient Sparse Matrix-Vector Multiplication on Intel PIUMA Architecture
    Aananthakrishnan, Sriram
    Pawlowski, Robert
    Fryman, Joshua
    Hur, Ibrahim
    2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [7] ACOUSTOOPTIC MATRIX-VECTOR MULTIPLICATION
    CAULFIELD, HJ
    RHODES, WT
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1981, 71 (12) : 1626 - 1626
  • [8] FPGA Implementation of Matrix-Vector Multiplication Using Xilinx System Generator
    Sayahi, Intissar
    Machhout, Mohsen
    Tourki, Rached
    2018 INTERNATIONAL CONFERENCE ON ADVANCED SYSTEMS AND ELECTRICAL TECHNOLOGIES (IC_ASET), 2017, : 290 - 295
  • [9] FPGA Implementation of a Unidirectional Systolic Array Generator for Matrix-Vector Multiplication
    Karra, M. Ch.
    Bekakos, M. P.
    Milovanovic, I. Z.
    Milovanovic, E. I.
    ICSPC: 2007 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1-3, PROCEEDINGS, 2007, : 153 - +
  • [10] A High Memory Bandwidth FPGA Accelerator for Sparse Matrix-Vector Multiplication
    Fowers, Jeremy
    Ovtcharov, Kalin
    Strauss, Karin
    Chung, Eric S.
    Stitt, Greg
    2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 36 - 43