An Efficient CNN Inference Accelerator Based on Intra- and Inter-Channel Feature Map Compression

被引:1
|
作者
Xie, Chenjia [1 ]
Shao, Zhuang [1 ]
Zhao, Ning [1 ]
Du, Yuan [1 ]
Du, Li [1 ]
机构
[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Peoples R China
关键词
Deep convolution neural networks; interlayer feature map compression; principal component analysis; DEEP; NETWORKS;
D O I
10.1109/TCSI.2023.3287602
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep convolutional neural networks (CNNs) generate intensive inter-layer data during inference, which results in substantial on-chip memory size and off-chip bandwidth. To solve the memory constraint, this paper proposes an accelerator adopting a compression technique that can reduce the inter-layer data by removing both intra-and inter-channel redundant information. Principal component analysis (PCA) is utilized in the compression process to concentrate inter-channel information. The spatial differences, truncation, and reconfigurable bit-width coding are implemented inside every feature map to eliminate the intra-channel data redundancy. Moreover, a particular data arrangement is introduced to enhance data continuity to optimize PCA analysis and improve compression performance. A CNN accelerator with the proposed compression technique is designed to support the on-the-fly compression process by pipelining the reconstruction, CNN computation, and compression operation. The prototype accelerator is implemented using 28-nm CMOS technology. It achieves 819.2GOPS peak throughput and 3.75TOPS/W energy efficiency with 218.5mW. Experiments show that the proposed compression technique achieves compression ratios of 21.5% similar to 43.0% (8-bit mode) and 9.8% similar to 19.3% (16-bit mode) on state-of-the-art CNNs with a negligible accuracy loss.
引用
收藏
页码:3625 / 3638
页数:14
相关论文
共 25 条
  • [21] SPRINT: A High-Performance, Energy-Efficient, and Scalable Chiplet-Based Accelerator With Photonic Interconnects for CNN Inference
    Li, Yuan
    Louri, Ahmed
    Karanth, Avinash
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2022, 33 (10) : 2332 - 2345
  • [22] Efficient Block Matching Motion Estimation Using Multilevel Intra- and Inter-Subblock Features-Subblock-Based SATD
    Xiong, Bing
    Zhu, Ce
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2009, 19 (07) : 1039 - 1043
  • [23] Fast Depth Map Intra Coding for 3D Video Compression-Based Tensor Feature Extraction and Data Analysis
    Hamout, Hamza
    Elyousfi, Abderrahmane
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (07) : 1933 - 1945
  • [24] Spectrally efficient pilot tone-based compensation of inter-channel cross-phase modulation noise in a WDM coherent transmission using injection locking
    Kan, Takashi
    Sato, Kozo
    Yoshida, Masato
    Hirooka, Toshihiko
    Kasai, Keisuke
    Nakazawa, Masataka
    [J]. OPTICS EXPRESS, 2021, 29 (02) : 1454 - 1469
  • [25] A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse
    Yue, Jinshan
    Yuan, Zhe
    Feng, Xiaoyu
    He, Yifan
    Zhang, Zhixiao
    Si, Xin
    Liu, Ruhui
    Chang, Meng-Fan
    Li, Xueqing
    Yang, Huazhong
    Liu, Yongpan
    [J]. 2020 IEEE INTERNATIONAL SOLID- STATE CIRCUITS CONFERENCE (ISSCC), 2020, : 234 - +