Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

被引：15

作者：

Shao, Zhuang ^{[1
]}

Chen, Xiaoliang ^{[1
]}

Du, Li ^{[1
]}

Chen, Lei ^{[2
]}

Du, Yuan ^{[1
]}

Zhuang, Wei ^{[2
]}

Wei, Huadong ^{[1
]}

Xie, Chenjia ^{[1
]}

Wang, Zhongfeng ^{[1
]}

机构：

[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Peoples R China

[2] Beijing Microelectronicstechnol Inst, Beijing 100076, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2022年 / 69卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Deep convolution neural networks; discrete cosine transform; quantization; interlayer feature maps compression;

D O I：

10.1109/TCSI.2021.3120312

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps. In this paper, we propose an efficient hardware accelerator with an interlayer feature compression technique to significantly reduce the required on-chip memory size and off-chip memory access bandwidth. The accelerator compresses interlayer feature maps through transforming the stored data into frequency domain using hardware-implemented 8x 8 discrete cosine transform (DCT). The high-frequency components are removed after the DCT through quantization. Sparse matrix compression is utilized to further compress the interlayer feature maps. The on-chip memory allocation scheme is designed to support dynamic configuration of the feature map buffer size and scratch pad size according to different network-layer requirements. The hardware accelerator combines compression, decompression, and CNN acceleration into one computing stream, achieving minimal compressing and processing delay. A prototype accelerator is implemented on an FPGA platform and also synthesized in TSMC 28-nm COMS technology. It achieves 403GOPS peak throughput and 1.4x similar to 3.3x interlayer feature map reduction by adding light hardware area overhead, making it a promising hardware accelerator for intelligent IoT devices.

引用

页码：668 / 681

页数：14

共 50 条

[21] Map-based experience replay: a memory-efficient solution to catastrophic forgetting in reinforcement learning
Hafez, Muhammad Burhan
Immisch, Tilman
Weber, Tom
Wermter, Stefan
[J]. FRONTIERS IN NEUROROBOTICS, 2023, 17
[22] A memory-efficient model-based overdrive
Pan, H.
Feng, X.
Daly, S.
[J]. IDW '06: PROCEEDINGS OF THE 13TH INTERNATIONAL DISPLAY WORKSHOPS, VOLS 1-3, 2006, : 1981 - 1984
[23] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Jingcheng Shen
Linbo Long
Xin Deng
Masao Okita
Fumihiko Ino
[J]. The Journal of Supercomputing, 2023, 79 : 11055 - 11077
[24] PARTICLE STATE COMPRESSION SCHEME FOR CENTRALIZED MEMORY-EFFICIENT PARTICLE FILTERS
Tian, Qinglin
Pan, Yun
Yan, Xiaolang
Zheng, Ning
Huan, Ruohong
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2577 - 2581
[25] A compression-based memory-efficient optimization for out-of-core GPU stencil computation
Shen, Jingcheng
Long, Linbo
Deng, Xin
Okita, Masao
Ino, Fumihiko
[J]. JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11055 - 11077
[26] Mentor: A Memory-Efficient Sparse-dense Matrix Multiplication Accelerator Based on Column-Wise Product
Lu, Xiaobo
Fang, Jianbin
Peng, Lin
Huang, Chun
Du, Zidong
Zhao, Yongwei
Wang, Zheng
[J]. ACM Transactions on Architecture and Code Optimization, 2024, 21 (04)
[27] BHNN: a Memory-Efficient Accelerator for Compressing Deep Neural Networks with Blocked Hashing Techniques
Zhu, Jingyang
Qian, Zhiliang
Tsui, Chi-Ying
[J]. 2017 22ND ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2017, : 690 - 695
[28] Comparison of Curve Representations for Memory-Efficient and High-Precision Map Generation
Stannartz, Niklas
Theers, Mario
Llarena, Adalberto
Sons, Marc
Kuhn, Markus
Bertram, Torsten
[J]. 2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,
[29] GMMap: Memory-Efficient Continuous Occupancy Map Using Gaussian Mixture Model
Li, Peter Zhi Xuan
Karaman, Sertac
Sze, Vivienne
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2024, 40 : 1339 - 1355
[30] Feature Map Transform Coding for Energy-Efficient CNN Inference
Chmiel, Brian
Baskin, Chaim
Zheltonozhskii, Evgenii
Banner, Ron
Yermolin, Yevgeny
Karbachevsky, Alex
Bronstein, Alex M.
Mendelson, Avi
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,

← 1 2 3 4 5 →