Smart Memory: Deep Learning Acceleration in 3D-Stacked Memories

被引:0
|
作者
Rezaei, Seyyed Hossein SeyyedAghaei [1 ]
Moghaddam, Parham Zilouchian [1 ]
Modarressi, Mehdi [1 ]
机构
[1] Univ Tehran, Sch Elect & Comp Engn, Tehran 25529, Iran
关键词
Network-on-memory; processing-in-memory; 3D-stacked memory; deep learning accelerator;
D O I
10.1109/LCA.2023.3287976
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Processing-in-memory (PIM) is the most promising paradigm to address the bandwidth bottleneck in deep neural network (DNN) accelerators. However, the algorithmic and dataflow structure of DNNs still necessitates moving a large amount of data across banks inside the memory device to bring input data and their corresponding model parameters together, negatively shifting part of the bandwidth bottleneck to the in-memory data communication infrastructure. To alleviate this bottleneck, we present Smart Memory, a highly parallel in-memory DNN accelerator for 3D memories that benefits from a scalable high-bandwidth in-memory network. Whereas the existing PIM designs implement the compute units and network-on-chip on the logic die of the underlying 3D memory, in Smart Memory the computation and data transmission tasks are distributed across the memory banks. To this end, each memory bank is equipped with (1) a very simple processing unit to run neural networks, and (2) a circuit-switched router to interconnect memory banks by a 3D network-on-memory. Our evaluation shows 44% average performance improvement over state-of-the-art in-memory DNN accelerators.
引用
下载
收藏
页码:137 / 141
页数:5
相关论文
共 50 条
  • [21] Towards Near-Data Processing of Compare Operations in 3D-Stacked Memory
    Das, Palash
    Kapoor, Hemangee K.
    PROCEEDINGS OF THE 2018 GREAT LAKES SYMPOSIUM ON VLSI (GLSVLSI'18), 2018, : 243 - 248
  • [22] Distributed Memory Interface Synthesis for Network-on-Chips with 3D-Stacked DRAMs
    Chen, Yi-Jung
    Yang, Chi-Lin
    Chen, Jian-Jia
    2012 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2012, : 458 - 465
  • [23] HAM: Hotspot-Aware Manager for Improving Communications With 3D-Stacked Memory
    Wang, Xi
    Tumeo, Antonino
    Leidel, John D.
    Li, Jie
    Chen, Yong
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (06) : 833 - 848
  • [24] HAMLeT: Hardware Accelerated Memory Layout Transform within 3D-stacked DRAM
    Akin, Berkin
    Hoe, James C.
    Franchetti, Franz
    2014 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2014,
  • [25] GCIM: Toward Efficient Processing of Graph Convolutional Networks in 3D-Stacked Memory
    Chen, Jiaxian
    Lin, Yiquan
    Sun, Kaoyi
    Chen, Jiexin
    Ma, Chenlin
    Mao, Rui
    Wang, Yi
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (11) : 3579 - 3590
  • [26] An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures
    Wang, Runze
    Hu, Ao
    Zheng, Long
    Wang, Qinggang
    Yuan, Jingrui
    Liu, Haifeng
    Yu, Linchen
    Liao, Xiaofei
    Jin, Hai
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (05) : 1360 - 1373
  • [27] Design Space Exploration for 3D-stacked DRAMs
    Weis, Christian
    Wehn, Norbert
    Igor, Loi
    Benini, Luca
    2011 DESIGN, AUTOMATION & TEST IN EUROPE (DATE), 2011, : 389 - 394
  • [28] On Effective TSV Repair for 3D-Stacked ICs
    Jiang, Li
    Xu, Qiang
    Eklow, Bill
    DESIGN, AUTOMATION & TEST IN EUROPE (DATE 2012), 2012, : 793 - 798
  • [29] A Many-Core Hardware Acceleration Platform for Short Read Mapping Problem Using Distributed Memory Interface with 3D-stacked Architecture
    Liu, Pei
    Hemani, Ahmed
    Paul, Kolin
    2014 INTERNATIONAL SYMPOSIUM ON SYSTEM-ON-CHIP (SOC), 2014,
  • [30] FAULTSIM: A Fast, Configurable Memory-Reliability Simulator for Conventional and 3D-Stacked Systems
    Nair, Prashant J.
    Roberts, David A.
    Qureshi, Moinuddin K.
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2016, 12 (04)