Heterogeneous Multi-Functional Look-Up-Table-based Processing-in-Memory Architecture for Deep Learning Acceleration

被引:2
|
作者
Bavikadi, Sathwika [1 ]
Sutradhar, Purab Ranjan [2 ]
Ganguly, Amlan [2 ]
Dinakarrao, Sai Manoj Pudukotai [1 ]
机构
[1] George Mason Univ, Dept Elect & Comp Engn, Fairfax, VA 22030 USA
[2] Rochester Inst Technol, Dept Comp Engn, Rochester, NY USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/ISQED57927.2023.10129338
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Emerging applications including deep neural networks (DNNs) and convolutional neural networks (CNNs) employ massive amounts of data to perform computations and data analysis. Such applications often lead to resource constraints and impose large overheads in data movement between memory and compute units. Several architectures such as Processing-in-Memory (PIM) are introduced to alleviate the bandwidth bottlenecks and inefficiency of traditional computing architectures. However, the existing PIM architectures represent a trade-off between power, performance, area, energy efficiency, and programmability. To better achieve the energy-efficiency and flexibility criteria simultaneously in hardware accelerators, we introduce a multi-functional look-up-table (LUT)-based reconfigurable PIM architecture in this work. The proposed architecture is a many-core architecture, each core comprises processing elements (PEs), a stand-alone processor with programmable functional units built using high-speed reconfigurable LUTs. The proposed LUTs can perform various operations, including convolutional, pooling, and activation that are required for CNN acceleration. Additionally, the proposed LUTs are capable of providing multiple outputs relating to different functionalities simultaneously without the need to design different LUTs for different functionalities. This leads to optimized area and power overheads. Furthermore, we also design special-function LUTs, which can provide simultaneous outputs for multiplication and accumulation as well as special activation functions such as hyperbolics and sigmoids. We have evaluated various CNNs such as LeNet, AlexNet, and ResNet-18,34,50. Our experimental results have demonstrated that when AlexNet is implemented on the proposed architecture shows a maximum of 200x higher energy efficiency and 1.5x higher throughput than a DRAM-based LUT-based PIM architecture.
引用
收藏
页码:445 / 452
页数:8
相关论文
共 34 条
  • [1] Look-up-Table Based Processing-in-Memory Architecture With Programmable Precision-Scaling for Deep Learning Applications
    Sutradhar, Purab Ranjan
    Bavikadi, Sathwika
    Connolly, Mark
    Prajapati, Savankumar
    Indovina, Mark A.
    Dinakarrao, Sai Manoj Pudukotai
    Ganguly, Amlan
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (02) : 263 - 275
  • [2] Flexible Instruction Set Architecture for Programmable Look-up Table based Processing-in-Memory
    Connolly, Mark
    Sutradhar, Purab Ranjan
    Indovina, Mark
    Ganguly, Amlan
    [J]. 2021 IEEE 39TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2021), 2021, : 66 - 73
  • [3] Deep Learning Consideration with Novel Approach - Look-Up-Table Based Processing Conjugated Memory
    Otsuka, Kanji
    Sato, Yoichi
    [J]. 2017 12TH INTERNATIONAL MICROSYSTEMS, PACKAGING, ASSEMBLY AND CIRCUITS TECHNOLOGY CONFERENCE (IMPACT), 2017, : 126 - 129
  • [4] Deep Learning Consideration with Novel Approach --- Look-Up-Table Based Processing Conjugated Memory ---
    Otsuka, Kanji
    Sato, Yoichi
    [J]. 2018 INTERNATIONAL CONFERENCE ON ELECTRONICS PACKAGING AND IMAPS ALL ASIA CONFERENCE (ICEP-IAAC), 2018, : 152 - 156
  • [5] RETRANSFORMER: ReRAM-based Processing-in-Memory Architecture for Transformer Acceleration
    Yang, Xiaoxuan
    Yan, Bonan
    Li, Hai
    Chen, Yiran
    [J]. 2020 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED-DESIGN (ICCAD), 2020,
  • [6] A Ferroelectric FET-Based Processing-in-Memory Architecture for DNN Acceleration
    Long, Yun
    Kim, Daehyun
    Lee, Edward
    Saha, Priyabrata
    Mudassar, Burhan Ahmad
    She, Xueyuan
    Khan, Asif Islam
    Mukhopadhyay, Saibal
    [J]. IEEE JOURNAL ON EXPLORATORY SOLID-STATE COMPUTATIONAL DEVICES AND CIRCUITS, 2019, 5 (02): : 113 - 122
  • [7] PIM-DH: Re RAM-based Processing-in-Memory Architecture for Deep Hashing Acceleration
    Liu, Fangxin
    Zhao, Wenbo
    Chen, Yongbiao
    Wang, Zongwu
    He, Zhezhi
    Yang, Rui
    Tang, Qidong
    Yang, Tao
    Zhuo, Cheng
    Jiang, Li
    [J]. PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 1087 - 1092
  • [8] ReRAM-Based Processing-in-Memory Architecture for Recurrent Neural Network Acceleration
    Long, Yun
    Na, Taesik
    Mukhopadhyay, Saibal
    [J]. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2018, 26 (12) : 2781 - 2794
  • [9] FlutPIM: A Look-up Table-based Processing in Memory Architecture with Floating-point Computation Support for Deep Learning Applications
    Sutradhar, Purab Ranjan
    Bavikadi, Sathwika
    Indovina, Mark
    Dinakarrao, Sai Manoj Pudukotai
    Ganguly, Amlan
    [J]. PROCEEDINGS OF THE GREAT LAKES SYMPOSIUM ON VLSI 2023, GLSVLSI 2023, 2023, : 207 - 211
  • [10] Processing-in-Memory Designs Based on Emerging Technology for Efficient Machine Learning Acceleration
    Kim, Bokyung
    Li, Hai Helen
    Chen, Yiran
    [J]. PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 614 - 619