An improved architecture for bit-level matrix multiplication

被引：0

作者：

Grover, RS ^{[1
]}

Shang, WJ ^{[1
]}

Li, Q ^{[1
]}

机构：

[1] Santa Clara Univ, Dept Comp Engn, Santa Clara, CA 95053 USA

来源：

PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-V | 2000年

关键词：

bit-level matrix multiplication; FPGA array; mapping algorithms to hardware; reconfigurable computing;

D O I：

暂无

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a novel bit-level architecture where each processing element does a simple operation of adding three to six bits to generate one partial sum bit and one to two carryout bits. We gain speedup over word-level because individual bits of a word do not have to be processed as a unit in a bit-level architecture. In [1], two bit-level architectures for fixed point matrix multiplication are proposed that are O(log p) times faster than the fastest word-level architecture where p is the word length. The architecture presented in this paper is even faster than the two in [1] by breaking the critical path in the dependence graph into half: We show basic ideas of how to gain speedup in our design, how to establish the dependence structure and how to derive the final design. We also show our design is time optimal for our dependence structure and has a speedup of 50% or more over the designs presented in [1]. We are implementing the design on a Xilinx FPGA chip, which shows a potential speedup over Xilinx multiplier macro. Our approach can be used to map algorithms to hardware.

引用

页码：2257 / 2264

页数：8

共 50 条

[21] A Bit-Level Matrix Transpose for Bitmap-Index-Based Data Analytics
Xuan-Thuan Nguyen
Hong-Thu Nguyen
Cong-Kha Pham
2016 IEEE SIXTH INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND ELECTRONICS (ICCE), 2016, : 217 - 220
[22] Bit-level Locking for Concurrency Control
Abbass, Jad F.
Haraty, Ramzi A.
2009 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1 AND 2, 2009, : 168 - 173
[23] Unconditional bases and bit-level compression
Donoho, DL
APPLIED AND COMPUTATIONAL HARMONIC ANALYSIS, 1996, 3 (04) : 388 - 392
[24] Bit-level stopping in turbo decoding
Kim, DH
Kim, SW
57TH IEEE VEHICULAR TECHNOLOGY CONFERENCE, VTC 2003-SPRING, VOLS 1-4, PROCEEDINGS, 2003, : 2134 - 2138
[25] BIT-LEVEL SYNCHRONIZATION IN MICROPROCESSOR NETWORKS
SINTONEN, L
UOTILA, P
IEE PROCEEDINGS-E COMPUTERS AND DIGITAL TECHNIQUES, 1981, 128 (03): : 103 - 106
[26] A MODULO BIT-LEVEL SYSTOLIC COMPILER
JULLIEN, GA
BANDYOPADHYAY, S
MILLER, WC
FROST, R
1989 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-3, 1989, : 457 - 460
[27] Bit-level stopping of turbo decoding
Kim, Dong Ho
Kim, Sang Wu
IEEE COMMUNICATIONS LETTERS, 2006, 10 (03) : 183 - 185
[28] Exploiting Bit-Level Write Patterns to Reduce Energy Consumption in Hybrid Cache Architecture
Choi, Juhee
Park, Heemin
IEICE ELECTRONICS EXPRESS, 2021,
[29] Exploiting bit-level write patterns to reduce energy consumption in hybrid cache architecture
Choi, Juhee
Park, Heemin
IEICE ELECTRONICS EXPRESS, 2021, 18 (22)
[30] Accelerating matrix-centric graph processing on GPUs through bit-level optimizations
Chen, Jou-An
Sung, Hsin-Hsuan
Shen, Xipeng
Tallent, Nathan
Barker, Kevin
Li, Ang
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2023, 177 : 53 - 67

← 1 2 3 4 5 →