FPGA Processor In Memory Architectures (PIMs): Overlay or Overhaul ?

被引：0

作者：

Kabir, Md Arafat ^{[1
]}

Kabir, Ehsan ^{[1
]}

Hollis, Joshua ^{[1
]}

Levy-Mackay, Eli ^{[1
]}

Panahi, Atiyehsadat ^{[2
]}

Bakos, Jason ^{[3
]}

Huang, Miaoqing ^{[1
]}

Andrews, David ^{[1
]}

机构：

[1] Univ Arkansas, Dept Comp Sci & Comp Engn, Fayetteville, AR 72701 USA

[2] Univ South Carolina, Dept Comp Sci & Comp Engn, Columbia, SC 29208 USA

[3] Cadence Design Syst, Dept Comp Sci & Comp Engn, San Jose, CA USA

来源：

2023 33RD INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, FPL | 2023年

基金：

美国国家科学基金会;

关键词：

Processing-in-Memory; Bit-serial; Overlay; FPGA; Machine Learning; SIMD;

D O I：

10.1109/FPL60245.2023.00023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The dominance of machine learning and the ending of Moore's law have renewed interests in Processor in Memory (PIM) architectures. This interest has produced several recent proposals to modify an FPGA's BRAM architecture to form a next-generation PIM reconfigurable fabric [1], [2]. PIM architectures can also be realized within today's FPGAs as overlays without the need to modify the underlying FPGA architecture. To date, there has been no study to understand the comparative advantages of the two approaches. In this paper, we present a study that explores the comparative advantages between two proposed custom architectures and a PIM overlay running on a commodity FPGA. We created PiCaSO, a Processor in/near Memory Scalable and Fast Overlay architecture as a representative PIM overlay. The results of this study show that the PiCaSO overlay achieves up to 80% of the peak throughput of the custom designs with 2.56x shorter latency and 25% - 43% better BRAM memory utilization efficiency. We then show how several key features of the PiCaSO overlay can be integrated into the custom PIM designs to further improve their throughput by 18%, latency by 19.5%, and memory efficiency by 6.2%.

引用

页码：109 / 115

页数：7

共 50 条

[21] FET-OPU: A Flexible and Efficient FPGA-based Overlay Processor for Transformer Networks
Bai, Yueyin
Zhou, Hao
Zhao, Keqing
Wang, Hongji
Chen, Jianli
Yu, Jun
Wang, Kun
2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2023,
[22] An FPGA-based Multi-Core Overlay Processor for Transformer-based Models
Lu, Shaoqiang
Zhao, Tiandong
Zhang, Rumin
Lin, Ting-Jung
Wu, Chen
He, Lei
2024 INTERNATIONAL SYMPOSIUM OF ELECTRONICS DESIGN AUTOMATION, ISEDA 2024, 2024, : 697 - 702
[23] Light-OPU: An FPGA-based Overlay Processor for Lightweight Convolutional Neural Networks
Yu, Yunxuan
Zhao, Tiandong
Wang, Kun
He, Lei
2020 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA '20), 2020, : 122 - 132
[24] New non-volatile memory structures for FPGA architectures
Choi, David
Choi, Kyu
Villasenor, John D.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2008, 16 (07) : 874 - 881
[25] A practical multiple processor programming model for various distributed memory architectures
Howard, S
Alexander, WE
INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED PROCESSING TECHNIQUES AND APPLICATIONS, VOLS I-IV, PROCEEDINGS, 1998, : 151 - 158
[26] Memory-Aware Circuit Overlay NoCs for Latency Optimized GPGPU Architectures
Raparti, Venkata Yaswanth
Pasricha, Sudeep
PROCEEDINGS OF THE SEVENTEENTH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN ISQED 2016, 2016, : 63 - 68
[27] Supporting Concurrent Memory Access in TCF-aware Processor Architectures
Forsell, Martti
Roivainen, Jussi
Leppanen, Ville
Traff, Jesper Larsson
2017 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE (NORCAS): NORCHIP AND INTERNATIONAL SYMPOSIUM OF SYSTEM-ON-CHIP (SOC), 2017,
[28] The impact of modern FPGA architectures on neural hardware: A case study of the TOTEM neural processor
McBader, S
Lee, P
Sartori, A
2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3149 - 3154
[29] FPGA wavelet processor design using language for instruction-set Architectures (LISA)
Meyer-Baese, Uwe
Vera, Alonzo
Rao, Suhasini
Lenk, Karl
Pattichis, Marios
INDEPENDENT COMPONENT ANALYSES, WAVELETS, UNSUPERVISED NANO-BIOMIMETIC SENSORS, AND NEURAL NETWORKS V, 2007, 6576
[30] A Preprocessing Algorithm to Increase OCR Performance on Application Processor-Centric FPGA Architectures
Crovato, Cesar
Torok, Delfim
Heidrich, Regina
de Cerqueira, Bernardo
Velho, Eduardo
INCLUSIVE SMART CITIES AND DIGITAL HEALTH, 2016, 9677 : 27 - 34

← 1 2 3 4 5 →