Heterogeneous Multi-Functional Look-Up-Table-based Processing-in-Memory Architecture for Deep Learning Acceleration

被引：2

作者：

Bavikadi, Sathwika ^{[1
]}

Sutradhar, Purab Ranjan ^{[2
]}

Ganguly, Amlan ^{[2
]}

Dinakarrao, Sai Manoj Pudukotai ^{[1
]}

机构：

[1] George Mason Univ, Dept Elect & Comp Engn, Fairfax, VA 22030 USA

[2] Rochester Inst Technol, Dept Comp Engn, Rochester, NY USA

来源：

2023 24TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED | 2023年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/ISQED57927.2023.10129338

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Emerging applications including deep neural networks (DNNs) and convolutional neural networks (CNNs) employ massive amounts of data to perform computations and data analysis. Such applications often lead to resource constraints and impose large overheads in data movement between memory and compute units. Several architectures such as Processing-in-Memory (PIM) are introduced to alleviate the bandwidth bottlenecks and inefficiency of traditional computing architectures. However, the existing PIM architectures represent a trade-off between power, performance, area, energy efficiency, and programmability. To better achieve the energy-efficiency and flexibility criteria simultaneously in hardware accelerators, we introduce a multi-functional look-up-table (LUT)-based reconfigurable PIM architecture in this work. The proposed architecture is a many-core architecture, each core comprises processing elements (PEs), a stand-alone processor with programmable functional units built using high-speed reconfigurable LUTs. The proposed LUTs can perform various operations, including convolutional, pooling, and activation that are required for CNN acceleration. Additionally, the proposed LUTs are capable of providing multiple outputs relating to different functionalities simultaneously without the need to design different LUTs for different functionalities. This leads to optimized area and power overheads. Furthermore, we also design special-function LUTs, which can provide simultaneous outputs for multiplication and accumulation as well as special activation functions such as hyperbolics and sigmoids. We have evaluated various CNNs such as LeNet, AlexNet, and ResNet-18,34,50. Our experimental results have demonstrated that when AlexNet is implemented on the proposed architecture shows a maximum of 200x higher energy efficiency and 1.5x higher throughput than a DRAM-based LUT-based PIM architecture.

引用

页码：445 / 452

页数：8

共 34 条

[1] Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications
Sutradhar, Purab Ranjan
Bavikadi, Sathwika
Connolly, Mark
Prajapati, Savankumar
Indovina, Mark A.
Dinakarrao, Sai Manoj Pudukotai
Ganguly, Amlan
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (02) : 263 - 275
[2] Flexible Instruction Set Architecture for Programmable Look-up Table based Processing-in-Memory
Connolly, Mark
Sutradhar, Purab Ranjan
Indovina, Mark
Ganguly, Amlan
[J]. 2021 IEEE 39TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2021), 2021, : 66 - 73
[3] Deep Learning Consideration with Novel Approach - Look-Up-Table Based Processing Conjugated Memory
Otsuka, Kanji
Sato, Yoichi
[J]. 2017 12TH INTERNATIONAL MICROSYSTEMS, PACKAGING, ASSEMBLY AND CIRCUITS TECHNOLOGY CONFERENCE (IMPACT), 2017, : 126 - 129
[4] Deep Learning Consideration with Novel Approach --- Look-Up-Table Based Processing Conjugated Memory ---
Otsuka, Kanji
Sato, Yoichi
[J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS PACKAGING AND IMAPS ALL ASIA CONFERENCE (ICEP-IAAC), 2018, : 152 - 156
[5] RETRANSFORMER: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
Yang, Xiaoxuan
Yan, Bonan
Li, Hai
Chen, Yiran
[J]. 2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
[6] A Ferroelectric FET-Based Processing-in-Memory Architecture for DNN Acceleration
Long, Yun
Kim, Daehyun
Lee, Edward
Saha, Priyabrata
Mudassar, Burhan Ahmad
She, Xueyuan
Khan, Asif Islam
Mukhopadhyay, Saibal
[J]. IEEE JOURNAL ON EXPLORATORY SOLID-STATE COMPUTATIONAL DEVICES AND CIRCUITS, 2019, 5 (02): : 113 - 122
[7] PIM-DH: Re RAM-based Processing-in-Memory Architecture for Deep Hashing Acceleration
Liu, Fangxin
Zhao, Wenbo
Chen, Yongbiao
Wang, Zongwu
He, Zhezhi
Yang, Rui
Tang, Qidong
Yang, Tao
Zhuo, Cheng
Jiang, Li
[J]. PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 1087 - 1092
[8] ReRAM-Based Processing-in-Memory Architecture for Recurrent Neural Network Acceleration
Long, Yun
Na, Taesik
Mukhopadhyay, Saibal
[J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (12) : 2781 - 2794
[9] FlutPIM: A Look-up Table-based Processing in Memory Architecture with Floating-point Computation Support for Deep Learning Applications
Sutradhar, Purab Ranjan
Bavikadi, Sathwika
Indovina, Mark
Dinakarrao, Sai Manoj Pudukotai
Ganguly, Amlan
[J]. PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 207 - 211
[10] Processing-in-Memory Designs Based on Emerging Technology for Efficient Machine Learning Acceleration
Kim, Bokyung
Li, Hai Helen
Chen, Yiran
[J]. PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 614 - 619

← 1 2 3 4 →