A 5.99 TFLOPS/W Heterogeneous CIM-NPU Architecture for an Energy Efficient Floating-Point DNN Acceleration

被引:0
|
作者
Park, Wonhoon [1 ]
Ryu, Junha [1 ]
Kim, Sangjin [1 ]
Um, Soyeon [1 ]
Jo, Wooyoung [1 ]
Kim, Sangyoeb [1 ]
Yoo, Hoi-Jun [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon, South Korea
关键词
computing-in-memory; SRAM; deep neural network; floating-point; cache system; outlier-handling; reconfigurability;
D O I
10.1109/ISCAS46773.2023.10181869
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work presents an energy-efficient digitalbased computing-in-memory (CIM) processor to support floating-point (FP) deep neural network (DNN) acceleration. Previous FP-CIM processors have two limitations. Processors with post-alignment shows low throughput due to serial operation, and the other processor with pre-alignment incurs truncation error. To resolve these problems, we focus on the statistics that outlier exists according to shift amount in prealignment-based FP operation. As those outlier decreases energy efficiency due to long operation cycles, it needs to be processed separately. The proposed Hetero-FP-CIM integrates both CIM arrays and shared NPU, so they compute both dense inlier and sparse outlier respectively. It also includes efficient weight caching system to avoid entire weight copy in shared NPU. The proposed Hetero-FP-CIM is simulated in 28 nm CMOS technology and occupies 2.7 mm(2). As a result, it achieves 5.99 TOPS/W at ImageNet (ResNet50) with bfloat16 representation.
引用
下载
收藏
页数:4
相关论文
共 8 条
  • [1] A 19.7 TFLOPS/W Multiply-less Logarithmic Floating-Point CIM Architecture with Error-Reduced Compensated Approximate Adder
    Li, Mengjie
    Zhang, Hongyi
    He, Siqi
    Zhu, Haozhe
    Zhang, Hao
    Liu, Jinglei
    Chen, Jiayuan
    Hu, Zhenping
    Zeng, Xiaoyang
    Chen, Chixiao
    2024 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS 2024, 2024,
  • [2] ECIM: Exponent Computing in Memory for an Energy-Efficient Heterogeneous Floating-Point DNN Training Processor
    Lee, Juhyoung
    Kim, Jihoon
    Jo, Wooyoung
    Kim, Sangyeob
    Kim, Sangjin
    Yoo, Hoi-Jun
    IEEE MICRO, 2022, 42 (01) : 99 - 107
  • [3] TRANSPIRE: An energy-efficient TRANSprecision floating-point Programmable archItectuRE
    Prasad, Rohit
    Das, Satyajit
    Martin, Kevin J. M.
    Tagliavini, Giuseppe
    Coussy, Philippe
    Benini, Luca
    Rossi, Davide
    PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020), 2020, : 1067 - 1072
  • [4] A Transprecision Floating-Point Architecture for Energy-Efficient Embedded Computing
    Mach, Stefan
    Rossi, Davide
    Tagliavini, Giuseppe
    Marongiu, Andrea
    Benini, Luca
    2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
  • [5] A 1.97 TFLOPS/W Configurable SRAM-Based Floating-Point Computation-in-Memory Macro for Energy-Efficient AI Chips
    Mai, Yangzhan
    Wang, Mingyu
    Zhang, Chuanghao
    Zhong, Baiqing
    Yu, Zhiyi
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [6] An area- and energy-efficient hybrid architecture for floating-point FFT computations
    Wang, Mingyu
    Li, Zhaolin
    MICROPROCESSORS AND MICROSYSTEMS, 2019, 65 : 14 - 22
  • [7] Energy-Efficient Floating-Point MFCC Extraction Architecture for Speech Recognition Systems
    Jo, Jihyuck
    Yoo, Hoyoung
    Park, In-Cheol
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (02) : 754 - 758
  • [8] RIME: A Scalable and Energy-Efficient Processing-In-Memory Architecture for Floating-Point Operations
    Lu, Zhaojun
    Arafin, Md Tanvir
    Qu, Gang
    2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 120 - 125