FPGA Processor In Memory Architectures (PIMs): Overlay or Overhaul ?

被引:0
|
作者
Kabir, Md Arafat [1 ]
Kabir, Ehsan [1 ]
Hollis, Joshua [1 ]
Levy-Mackay, Eli [1 ]
Panahi, Atiyehsadat [2 ]
Bakos, Jason [3 ]
Huang, Miaoqing [1 ]
Andrews, David [1 ]
机构
[1] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR 72701 USA
[2] Univ South Carolina, Dept Comp Sci & Comp Engn, Columbia, SC 29208 USA
[3] Cadence Design Syst, Dept Comp Sci & Comp Engn, San Jose, CA USA
基金
美国国家科学基金会;
关键词
Processing-in-Memory; Bit-serial; Overlay; FPGA; Machine Learning; SIMD;
D O I
10.1109/FPL60245.2023.00023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dominance of machine learning and the ending of Moore's law have renewed interests in Processor in Memory (PIM) architectures. This interest has produced several recent proposals to modify an FPGA's BRAM architecture to form a next-generation PIM reconfigurable fabric [1], [2]. PIM architectures can also be realized within today's FPGAs as overlays without the need to modify the underlying FPGA architecture. To date, there has been no study to understand the comparative advantages of the two approaches. In this paper, we present a study that explores the comparative advantages between two proposed custom architectures and a PIM overlay running on a commodity FPGA. We created PiCaSO, a Processor in/near Memory Scalable and Fast Overlay architecture as a representative PIM overlay. The results of this study show that the PiCaSO overlay achieves up to 80% of the peak throughput of the custom designs with 2.56x shorter latency and 25% - 43% better BRAM memory utilization efficiency. We then show how several key features of the PiCaSO overlay can be integrated into the custom PIM designs to further improve their throughput by 18%, latency by 19.5%, and memory efficiency by 6.2%.
引用
收藏
页码:109 / 115
页数:7
相关论文
共 50 条
  • [21] FET-OPU: A Flexible and Efficient FPGA-based Overlay Processor for Transformer Networks
    Bai, Yueyin
    Zhou, Hao
    Zhao, Keqing
    Wang, Hongji
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
  • [22] An FPGA-based Multi-Core Overlay Processor for Transformer-based Models
    Lu, Shaoqiang
    Zhao, Tiandong
    Zhang, Rumin
    Lin, Ting-Jung
    Wu, Chen
    He, Lei
    2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 697 - 702
  • [23] Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks
    Yu, Yunxuan
    Zhao, Tiandong
    Wang, Kun
    He, Lei
    2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20), 2020, : 122 - 132
  • [24] New non-volatile memory structures for FPGA architectures
    Choi, David
    Choi, Kyu
    Villasenor, John D.
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2008, 16 (07) : 874 - 881
  • [25] A practical multiple processor programming model for various distributed memory architectures
    Howard, S
    Alexander, WE
    INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 151 - 158
  • [26] Memory-Aware Circuit Overlay NoCs for Latency Optimized GPGPU Architectures
    Raparti, Venkata Yaswanth
    Pasricha, Sudeep
    PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN ISQED 2016, 2016, : 63 - 68
  • [27] Supporting Concurrent Memory Access in TCF-aware Processor Architectures
    Forsell, Martti
    Roivainen, Jussi
    Leppanen, Ville
    Traff, Jesper Larsson
    2017 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS): NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2017,
  • [28] The impact of modern FPGA architectures on neural hardware: A case study of the TOTEM neural processor
    McBader, S
    Lee, P
    Sartori, A
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3149 - 3154
  • [29] FPGA wavelet processor design using language for instruction-set Architectures (LISA)
    Meyer-Baese, Uwe
    Vera, Alonzo
    Rao, Suhasini
    Lenk, Karl
    Pattichis, Marios
    INDEPENDENT COMPONENT ANALYSES, WAVELETS, UNSUPERVISED NANO-BIOMIMETIC SENSORS, AND NEURAL NETWORKS V, 2007, 6576
  • [30] A Preprocessing Algorithm to Increase OCR Performance on Application Processor-Centric FPGA Architectures
    Crovato, Cesar
    Torok, Delfim
    Heidrich, Regina
    de Cerqueira, Bernardo
    Velho, Eduardo
    INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 27 - 34