H-SIMD machine: Configurable parallel computing for matrix multiplication

被引:0
|
作者
Xu, XZ [1 ]
Ziavras, SG [1 ]
机构
[1] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
FPGAs (Field-Programmable Gate Arrays) are often used as coprocessors to boost the performance of dataintensive applications [1, 2]. However, mapping algorithms onto multimillion-gate FPGAs is time consuming and remains a challenge in configurable system design. The communication overhead between the host workstation and the FPGAs is also significant. To address these problems, we propose in this paper the FPGA-based Hierarchical-SIMD (H-SIMD) machine with its codesign of the Hierarchical Instruction Set Architecture (HISA). At each level, HISA instructions are classified into communication instructions or computation instructions. The former are executed by the local controller while the latter are issued to the lower level for execution. Additionally, by using a memory switching scheme and the high-level HISA set to partition the application into coarse-grain tasks, the host-FPGA communication overhead can be hidden. We enlist matrix multiplication (MM) to test the effectiveness of HSIMD. The test results show sustained high performance.
引用
收藏
页码:671 / 676
页数:6
相关论文
共 50 条
  • [41] GPU computing performance analysis on matrix multiplication
    Huang, Zhibin
    Ma, Ning
    Wang, Shaojun
    Peng, Yu
    JOURNAL OF ENGINEERING-JOE, 2019, 2019 (23): : 9043 - 9048
  • [42] Emulating quantum computing with optical matrix multiplication
    Koni, Mwezi
    Bezuidenhout, Hadrian
    Nape, Isaac
    APL PHOTONICS, 2024, 9 (10)
  • [44] Phase difference stereo disparity computation on a SIMD parallel machine
    Valentinotti, F
    Taraglio, S
    HIGH-PERFORMANCE COMPUTING AND NETWORKING, 1997, 1225 : 127 - 136
  • [45] Scalable parallel matrix multiplication on distributed memory parallel computers
    Li, KQ
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (12) : 1709 - 1731
  • [46] Scalable machine learning computing a data summarization matrix with a parallel array DBMS
    Carlos Ordonez
    Yiqun Zhang
    S. Lennart Johnsson
    Distributed and Parallel Databases, 2019, 37 : 329 - 350
  • [47] Scalable machine learning computing a data summarization matrix with a parallel array DBMS
    Ordonez, Carlos
    Zhang, Yiqun
    Johnsson, S. Lennart
    DISTRIBUTED AND PARALLEL DATABASES, 2019, 37 (03) : 329 - 350
  • [48] Parallel matrix multiplication algorithms on hypercube multiprocessors
    Lee, PZ
    INTERNATIONAL JOURNAL OF HIGH SPEED COMPUTING, 1995, 7 (03): : 391 - 406
  • [49] Comparison of some parallel matrix multiplication algorithms
    Tasic, JF
    Zajc, M
    Kosir, A
    MELECON '96 - 8TH MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, PROCEEDINGS, VOLS I-III: INDUSTRIAL APPLICATIONS IN POWER SYSTEMS, COMPUTER SCIENCE AND TELECOMMUNICATIONS, 1996, : 155 - 158
  • [50] Parallel Complexity of Matrix Multiplication1
    Eunice E. Santos
    The Journal of Supercomputing, 2003, 25 : 155 - 175