Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

被引:15
|
作者
Shao, Zhuang [1 ]
Chen, Xiaoliang [1 ]
Du, Li [1 ]
Chen, Lei [2 ]
Du, Yuan [1 ]
Zhuang, Wei [2 ]
Wei, Huadong [1 ]
Xie, Chenjia [1 ]
Wang, Zhongfeng [1 ]
机构
[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Peoples R China
[2] Beijing Microelectronicstechnol Inst, Beijing 100076, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep convolution neural networks; discrete cosine transform; quantization; interlayer feature maps compression;
D O I
10.1109/TCSI.2021.3120312
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps. In this paper, we propose an efficient hardware accelerator with an interlayer feature compression technique to significantly reduce the required on-chip memory size and off-chip memory access bandwidth. The accelerator compresses interlayer feature maps through transforming the stored data into frequency domain using hardware-implemented 8x 8 discrete cosine transform (DCT). The high-frequency components are removed after the DCT through quantization. Sparse matrix compression is utilized to further compress the interlayer feature maps. The on-chip memory allocation scheme is designed to support dynamic configuration of the feature map buffer size and scratch pad size according to different network-layer requirements. The hardware accelerator combines compression, decompression, and CNN acceleration into one computing stream, achieving minimal compressing and processing delay. A prototype accelerator is implemented on an FPGA platform and also synthesized in TSMC 28-nm COMS technology. It achieves 403GOPS peak throughput and 1.4x similar to 3.3x interlayer feature map reduction by adding light hardware area overhead, making it a promising hardware accelerator for intelligent IoT devices.
引用
收藏
页码:668 / 681
页数:14
相关论文
共 50 条
  • [1] An Efficient CNN Inference Accelerator Based on Intra- and Inter-Channel Feature Map Compression
    Xie, Chenjia
    Shao, Zhuang
    Zhao, Ning
    Du, Yuan
    Du, Li
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (09) : 3625 - 3638
  • [2] A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression
    Lee, Hyunseung
    Hong, Jihoon
    Kim, Soosung
    Lee, Seung Yul
    Lee, Jae W.
    [J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [3] A Memory-Efficient CNN Accelerator Using Segmented Logarithmic Quantization and Multi-Cluster Architecture
    Xu, Jiawei
    Huan, Yuxiang
    Huang, Boming
    Chu, Haoming
    Jin, Yi
    Zheng, Li-Rong
    Zou, Zhuo
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (06) : 2142 - 2146
  • [4] Transform-Based Feature Map Compression for CNN Inference
    Shi, Yubo
    Wang, Meiqi
    Chen, Siyi
    Wei, Jinghe
    Wang, Zhongfeng
    [J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
  • [5] Facto-CNN: Memory-Efficient CNN Training with Low-rank Tensor Factorization and Lossy Tensor Compression
    Lee, Seungtae
    Ko, Jonghwan
    Hong, Seokin
    [J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
  • [6] Memory-efficient spatial prediction image compression scheme
    Nandi, Anil V.
    Patnaik, L. M.
    Banakar, R. M.
    [J]. IMAGE AND VISION COMPUTING, 2007, 25 (06) : 899 - 906
  • [7] Sparse Bitmap Compression for Memory-Efficient Training on the Edge
    Hosny, Abdelrahman
    Neseem, Marina
    Reda, Sherief
    [J]. 2021 ACM/IEEE 6TH SYMPOSIUM ON EDGE COMPUTING (SEC 2021), 2021, : 14 - 25
  • [8] Adaptive Weight Compression for Memory-Efficient Neural Networks
    Ko, Jong Hwan
    Kim, Duckhwan
    Na, Taesik
    Kung, Jaeha
    Mukhopadhyay, Saibal
    [J]. PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 199 - 204
  • [9] CNN Inference Accelerators with Adjustable Feature Map Compression Ratios
    Tsai, Yu-Chih
    Liu, Chung-Yueh
    Wang, Chia-Chun
    Hsu, Tsen-Wei
    Liu, Ren-Shuo
    [J]. 2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 631 - 634
  • [10] A memory-efficient block-wise MAP decoder architecture
    Kim, S
    Hwang, SY
    Kang, MJ
    [J]. ETRI JOURNAL, 2004, 26 (06) : 615 - 621