Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

被引:15
|
作者
Shao, Zhuang [1 ]
Chen, Xiaoliang [1 ]
Du, Li [1 ]
Chen, Lei [2 ]
Du, Yuan [1 ]
Zhuang, Wei [2 ]
Wei, Huadong [1 ]
Xie, Chenjia [1 ]
Wang, Zhongfeng [1 ]
机构
[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Peoples R China
[2] Beijing Microelectronicstechnol Inst, Beijing 100076, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep convolution neural networks; discrete cosine transform; quantization; interlayer feature maps compression;
D O I
10.1109/TCSI.2021.3120312
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps. In this paper, we propose an efficient hardware accelerator with an interlayer feature compression technique to significantly reduce the required on-chip memory size and off-chip memory access bandwidth. The accelerator compresses interlayer feature maps through transforming the stored data into frequency domain using hardware-implemented 8x 8 discrete cosine transform (DCT). The high-frequency components are removed after the DCT through quantization. Sparse matrix compression is utilized to further compress the interlayer feature maps. The on-chip memory allocation scheme is designed to support dynamic configuration of the feature map buffer size and scratch pad size according to different network-layer requirements. The hardware accelerator combines compression, decompression, and CNN acceleration into one computing stream, achieving minimal compressing and processing delay. A prototype accelerator is implemented on an FPGA platform and also synthesized in TSMC 28-nm COMS technology. It achieves 403GOPS peak throughput and 1.4x similar to 3.3x interlayer feature map reduction by adding light hardware area overhead, making it a promising hardware accelerator for intelligent IoT devices.
引用
收藏
页码:668 / 681
页数:14
相关论文
共 50 条
  • [21] Map-based experience replay: a memory-efficient solution to catastrophic forgetting in reinforcement learning
    Hafez, Muhammad Burhan
    Immisch, Tilman
    Weber, Tom
    Wermter, Stefan
    [J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
  • [22] A memory-efficient model-based overdrive
    Pan, H.
    Feng, X.
    Daly, S.
    [J]. IDW '06: PROCEEDINGS OF THE 13TH INTERNATIONAL DISPLAY WORKSHOPS, VOLS 1-3, 2006, : 1981 - 1984
  • [23] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Jingcheng Shen
    Linbo Long
    Xin Deng
    Masao Okita
    Fumihiko Ino
    [J]. The Journal of Supercomputing, 2023, 79 : 11055 - 11077
  • [24] PARTICLE STATE COMPRESSION SCHEME FOR CENTRALIZED MEMORY-EFFICIENT PARTICLE FILTERS
    Tian, Qinglin
    Pan, Yun
    Yan, Xiaolang
    Zheng, Ning
    Huan, Ruohong
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2577 - 2581
  • [25] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
    Shen, Jingcheng
    Long, Linbo
    Deng, Xin
    Okita, Masao
    Ino, Fumihiko
    [J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
  • [26] Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product
    Lu, Xiaobo
    Fang, Jianbin
    Peng, Lin
    Huang, Chun
    Du, Zidong
    Zhao, Yongwei
    Wang, Zheng
    [J]. ACM Transactions on Architecture and Code Optimization, 2024, 21 (04)
  • [27] BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Networks with Blocked Hashing Techniques
    Zhu, Jingyang
    Qian, Zhiliang
    Tsui, Chi-Ying
    [J]. 2017 22ND ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2017, : 690 - 695
  • [28] Comparison of Curve Representations for Memory-Efficient and High-Precision Map Generation
    Stannartz, Niklas
    Theers, Mario
    Llarena, Adalberto
    Sons, Marc
    Kuhn, Markus
    Bertram, Torsten
    [J]. 2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
  • [29] GMMap: Memory-Efficient Continuous Occupancy Map Using Gaussian Mixture Model
    Li, Peter Zhi Xuan
    Karaman, Sertac
    Sze, Vivienne
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2024, 40 : 1339 - 1355
  • [30] Feature Map Transform Coding for Energy-Efficient CNN Inference
    Chmiel, Brian
    Baskin, Chaim
    Zheltonozhskii, Evgenii
    Banner, Ron
    Yermolin, Yevgeny
    Karbachevsky, Alex
    Bronstein, Alex M.
    Mendelson, Avi
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,