High-Throughput, Area-Efficient, and Variation-Tolerant 3-D In-Memory Compute System for Deep Convolutional Neural Networks

被引:15
|
作者
Veluri, Hasita [1 ]
Li, Yida [2 ]
Niu, Jessie Xuhua [1 ]
Zamburg, Evgeny [1 ]
Thean, Aaron Voon-Yew [1 ]
机构
[1] Natl Univ Singapore, Elect & Comp Engn, Singapore 117583, Singapore
[2] Southern Univ Sci & Technol, Engn Res Ctr Integrated Circuits Next Generat Com, Minist Educ, Shenzhen 518055, Peoples R China
基金
新加坡国家研究基金会;
关键词
Deep neural nets; DTCO for IoT; in-memory compute; memristors; system design;
D O I
10.1109/JIOT.2021.3058015
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Untethered computing using deep convolutional neural networks (DCNNs) at the edge of IoT with limited resources requires systems that are exceedingly power and area-efficient. Analog in-memory matrix-matrix multiplications enabled by emerging memories can significantly reduce the energy budget of such systems and result in compact accelerators. In this article, we report a high-throughput RRAM-based DCNN processor that boasts 7.12x area-efficiency (AE) and 6.52x power-efficiency (PE) enhancements over state-of-the-art accelerators. We achieve this by coupling a novel in-memory computing methodology with a staggered-3D memristor array. Our variation-tolerant in-memory compute method, which performs operations on signed floating-point numbers within a single array, leverages charge domain operations and conductance discretization to reduce peripheral overheads. Voltage pulses applied at the staggered bottom electrodes of the 3D-array generate a concurrent input shift and parallelize convolution operations to boost throughput. The high density and low footprint of the 3D-array, along with the modified in-memory M2M execution, improve peak AE to 9.1TOPsmm(-2) while the elimination of input regeneration improves PE to 10.6TOPsW(-1). This work provides a path towards infallible RRAM-based hardware accelerators that are fast, low power, and low area.
引用
收藏
页码:9219 / 9232
页数:14
相关论文
共 3 条
  • [1] Area-Efficient and Variation-Tolerant In-Memory BNN Computing using 6T SRAM Array
    Kim, Jinseok
    Koo, Jongeun
    Kim, Taesu
    Kim, Yulhwa
    Kim, Hyungjun
    Yoo, Seunghyun
    Kim, Jae-Joon
    [J]. 2019 SYMPOSIUM ON VLSI CIRCUITS, 2019, : C118 - C119
  • [2] High-Throughput In-Memory Computing for Binary Deep Neural Networks With Monolithically Integrated RRAM and 90-nm CMOS
    Yin, Shihui
    Sun, Xiaoyu
    Yu, Shimeng
    Seo, Jae-Sun
    [J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 2020, 67 (10) : 4185 - 4192
  • [3] A Low-Power High-Throughput In-Memory CMOS-ReRAM Accelerator for Large-Scale Deep Residual Neural Networks
    Cheng, Yuan
    Wong, Ngai
    Liu, Xiong
    Ni, Leibin
    Chen, Hai-Bao
    Yu, Hao
    [J]. 2019 IEEE 13TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2019,