FPGA Processor In Memory Architectures (PIMs): Overlay or Overhaul ?

被引:0
|
作者
Kabir, Md Arafat [1 ]
Kabir, Ehsan [1 ]
Hollis, Joshua [1 ]
Levy-Mackay, Eli [1 ]
Panahi, Atiyehsadat [2 ]
Bakos, Jason [3 ]
Huang, Miaoqing [1 ]
Andrews, David [1 ]
机构
[1] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR 72701 USA
[2] Univ South Carolina, Dept Comp Sci & Comp Engn, Columbia, SC 29208 USA
[3] Cadence Design Syst, Dept Comp Sci & Comp Engn, San Jose, CA USA
基金
美国国家科学基金会;
关键词
Processing-in-Memory; Bit-serial; Overlay; FPGA; Machine Learning; SIMD;
D O I
10.1109/FPL60245.2023.00023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dominance of machine learning and the ending of Moore's law have renewed interests in Processor in Memory (PIM) architectures. This interest has produced several recent proposals to modify an FPGA's BRAM architecture to form a next-generation PIM reconfigurable fabric [1], [2]. PIM architectures can also be realized within today's FPGAs as overlays without the need to modify the underlying FPGA architecture. To date, there has been no study to understand the comparative advantages of the two approaches. In this paper, we present a study that explores the comparative advantages between two proposed custom architectures and a PIM overlay running on a commodity FPGA. We created PiCaSO, a Processor in/near Memory Scalable and Fast Overlay architecture as a representative PIM overlay. The results of this study show that the PiCaSO overlay achieves up to 80% of the peak throughput of the custom designs with 2.56x shorter latency and 25% - 43% better BRAM memory utilization efficiency. We then show how several key features of the PiCaSO overlay can be integrated into the custom PIM designs to further improve their throughput by 18%, latency by 19.5%, and memory efficiency by 6.2%.
引用
收藏
页码:109 / 115
页数:7
相关论文
共 50 条
  • [1] OVERVIEW OF A FPGA-BASED OVERLAY PROCESSOR
    Yu, Yunxuan
    Wu, Chen
    Shi, Xiao
    He, Lei
    2019 CHINA SEMICONDUCTOR TECHNOLOGY INTERNATIONAL CONFERENCE (CSTIC), 2019,
  • [2] Extended overlay architectures for heterogeneous FPGA cluster management
    Najem, Mohamad
    Bollengier, Theotime
    Le Lann, Jean-Christophe
    Lagadec, Loic
    JOURNAL OF SYSTEMS ARCHITECTURE, 2017, 78 : 1 - 14
  • [3] Demo: Overlay Architectures For Heterogeneous FPGA Cluster Management
    Bollengier, Theotime
    Najem, Mohamad
    Le Lann, Jean-Christophe
    Lagadec, Loic
    PROCEEDINGS OF THE 2016 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL & IMAGE PROCESSING, 2016, : 239 - 240
  • [4] Time-Multiplexed FPGA Overlay Architectures: A Survey
    Li, Xiangwei
    Maskell, Douglas L.
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2019, 24 (05)
  • [5] OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks
    Yu, Yunxuan
    Wu, Chen
    Zhao, Tiandong
    Wang, Kun
    He, Lei
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (01) : 35 - 47
  • [6] Design a Novel Memory Network for Processor-in-Memory Architectures
    Chu, Slo-Li
    Ho, Wen-Chih
    Chen, Chien-Fang
    Ceng, Kai-Wei
    Liu, Ming-Han
    2017 13TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG 2017), 2017, : 56 - 61
  • [7] FPGA SAR processor with window memory accesses
    Dou, Yong
    Zhou, Jie
    Lei, Yuanwu
    Zhou, Xingming
    2007 IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES, AND PROCESSORS, 2007, : 95 - 100
  • [8] Implementing logic in FPGA memory arrays: Heterogeneous memory architectures
    Wilton, SJE
    2002 IEEE INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (FPT), PROCEEDINGS, 2002, : 142 - 147
  • [9] Transformer-OPU: An FPGA-based Overlay Processor for Transformer Networks
    Bai, Yueyin
    Zhou, Hao
    Zhao, Keqing
    Chen, Jianli
    Yu, Jun
    Wang, Kun
    2023 IEEE 31ST ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, FCCM, 2023, : 222 - 222
  • [10] Effects of memory lateneies on nonblocking processor/cache architectures
    1600, ACM SIGARCH (Publ by ACM, New York, NY, USA):