Towards a Universal FPGA Matrix-Vector Multiplication Architecture

被引:44
|
作者
Kestur, Srinidhi [1 ]
Davis, John D. [2 ]
Chung, Eric S. [2 ]
机构
[1] Penn State Univ, Dept Comp Sci & Engn, University Pk, PA 16802 USA
[2] Microsoft Res Silicon Valley, Mountain View, CA 94043 USA
关键词
FPGA; dense matrix; sparse matrix; spMV; reconfigurable computing;
D O I
10.1109/FCCM.2012.12
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present the design and implementation of a universal, single-bitstream library for accelerating matrix-vector multiplication using FPGAs. Our library handles multiple matrix encodings ranging from dense to multiple sparse formats. A key novelty in our approach is the introduction of a hardware-optimized sparse matrix representation called Compressed Variable-Length Bit Vector (CVBV), which reduces the storage and bandwidth requirements up to 43% (on average 25%) compared to compressed sparse row (CSR) across all the matrices from the University of Florida Sparse Matrix Collection. Our hardware incorporates a runtime-programmable decoder that performs on-the-fly decoding of various formats such as Dense, COO, CSR, DIA, and ELL. The flexibility and scalability of our design is demonstrated across two FPGA platforms: (1) the BEE3 (Virtex-5 LX155T with 16GB of DRAM) and (2) ML605 (Virtex-6 LX240T with 2GB of DRAM). For dense matrices, our approach scales to large data sets with over 1 billion elements, and achieves robust performance independent of the matrix aspect ratio. For sparse matrices, our approach using a compressed representation reduces the overall bandwidth while also achieving comparable efficiency relative to state-of-the-art approaches.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 50 条
  • [31] Understanding the performance of sparse matrix-vector multiplication
    Goumas, Georgios
    Kourtis, Kornilios
    Anastopoulos, Nikos
    Karakasis, Vasileios
    Koziris, Nectarios
    PROCEEDINGS OF THE 16TH EUROMICRO CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING, 2008, : 283 - +
  • [32] Node aware sparse matrix-vector multiplication
    Bienz, Amanda
    Gropp, William D.
    Olson, Luke N.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 130 : 166 - 178
  • [33] A Method of Zero/One Matrix-Vector Multiplication
    Yu, Li
    Liu, Xia
    Chen, Ang
    Li, Yuan-Xiang
    PROCEEDINGS OF FIRST INTERNATIONAL CONFERENCE OF MODELLING AND SIMULATION, VOL III: MODELLING AND SIMULATION IN ELECTRONICS, COMPUTING, AND BIO-MEDICINE, 2008, : 95 - 99
  • [34] AN EFFICIENT PARALLEL ALGORITHM FOR MATRIX-VECTOR MULTIPLICATION
    HENDRICKSON, B
    LELAND, R
    PLIMPTON, S
    INTERNATIONAL JOURNAL OF HIGH SPEED COMPUTING, 1995, 7 (01): : 73 - 88
  • [35] STRUCTURED SPARSE MATRIX-VECTOR MULTIPLICATION ON A MASPAR
    DEHN, T
    EIERMANN, M
    GIEBERMANN, K
    SPERLING, V
    ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND MECHANIK, 1994, 74 (06): : T534 - T538
  • [36] GPU accelerated sparse matrix-vector multiplication and sparse matrix-transpose vector multiplication
    Tao, Yuan
    Deng, Yangdong
    Mu, Shuai
    Zhang, Zhenzhong
    Zhu, Mingfa
    Xiao, Limin
    Ruan, Li
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2015, 27 (14): : 3771 - 3789
  • [37] The Mailman algorithm: A note on matrix-vector multiplication
    Liberty, Edo
    Zucker, Steven W.
    INFORMATION PROCESSING LETTERS, 2009, 109 (03) : 179 - 182
  • [38] Performance Aspects of Sparse Matrix-Vector Multiplication
    Simecek, I.
    ACTA POLYTECHNICA, 2006, 46 (03) : 3 - 8
  • [39] A REUSABLE SYSTOLIC ARRAY FOR MATRIX-VECTOR MULTIPLICATION
    EVANS, DJ
    MARGARITIS, KG
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 1991, 41 (1-2) : 19 - 30
  • [40] Efficient dense matrix-vector multiplication on GPU
    He, Guixia
    Gao, Jiaquan
    Wang, Jun
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (19):