An improved architecture for bit-level matrix multiplication

被引:0
|
作者
Grover, RS [1 ]
Shang, WJ [1 ]
Li, Q [1 ]
机构
[1] Santa Clara Univ, Dept Comp Engn, Santa Clara, CA 95053 USA
关键词
bit-level matrix multiplication; FPGA array; mapping algorithms to hardware; reconfigurable computing;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We present a novel bit-level architecture where each processing element does a simple operation of adding three to six bits to generate one partial sum bit and one to two carryout bits. We gain speedup over word-level because individual bits of a word do not have to be processed as a unit in a bit-level architecture. In [1], two bit-level architectures for fixed point matrix multiplication are proposed that are O(log p) times faster than the fastest word-level architecture where p is the word length. The architecture presented in this paper is even faster than the two in [1] by breaking the critical path in the dependence graph into half: We show basic ideas of how to gain speedup in our design, how to establish the dependence structure and how to derive the final design. We also show our design is time optimal for our dependence structure and has a speedup of 50% or more over the designs presented in [1]. We are implementing the design on a Xilinx FPGA chip, which shows a potential speedup over Xilinx multiplier macro. Our approach can be used to map algorithms to hardware.
引用
收藏
页码:2257 / 2264
页数:8
相关论文
共 50 条
  • [21] A Bit-Level Matrix Transpose for Bitmap-Index-Based Data Analytics
    Xuan-Thuan Nguyen
    Hong-Thu Nguyen
    Cong-Kha Pham
    2016 IEEE SIXTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2016, : 217 - 220
  • [22] Bit-level Locking for Concurrency Control
    Abbass, Jad F.
    Haraty, Ramzi A.
    2009 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2009, : 168 - 173
  • [23] Unconditional bases and bit-level compression
    Donoho, DL
    APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 1996, 3 (04) : 388 - 392
  • [24] Bit-level stopping in turbo decoding
    Kim, DH
    Kim, SW
    57TH IEEE VEHICULAR TECHNOLOGY CONFERENCE, VTC 2003-SPRING, VOLS 1-4, PROCEEDINGS, 2003, : 2134 - 2138
  • [25] BIT-LEVEL SYNCHRONIZATION IN MICROPROCESSOR NETWORKS
    SINTONEN, L
    UOTILA, P
    IEE PROCEEDINGS-E COMPUTERS AND DIGITAL TECHNIQUES, 1981, 128 (03): : 103 - 106
  • [26] A MODULO BIT-LEVEL SYSTOLIC COMPILER
    JULLIEN, GA
    BANDYOPADHYAY, S
    MILLER, WC
    FROST, R
    1989 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, 1989, : 457 - 460
  • [27] Bit-level stopping of turbo decoding
    Kim, Dong Ho
    Kim, Sang Wu
    IEEE COMMUNICATIONS LETTERS, 2006, 10 (03) : 183 - 185
  • [28] Exploiting Bit-Level Write Patterns to Reduce Energy Consumption in Hybrid Cache Architecture
    Choi, Juhee
    Park, Heemin
    IEICE ELECTRONICS EXPRESS, 2021,
  • [29] Exploiting bit-level write patterns to reduce energy consumption in hybrid cache architecture
    Choi, Juhee
    Park, Heemin
    IEICE ELECTRONICS EXPRESS, 2021, 18 (22)
  • [30] Accelerating matrix-centric graph processing on GPUs through bit-level optimizations
    Chen, Jou-An
    Sung, Hsin-Hsuan
    Shen, Xipeng
    Tallent, Nathan
    Barker, Kevin
    Li, Ang
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 177 : 53 - 67