Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

被引：15

作者：

Shao, Zhuang ^{[1
]}

Chen, Xiaoliang ^{[1
]}

Du, Li ^{[1
]}

Chen, Lei ^{[2
]}

Du, Yuan ^{[1
]}

Zhuang, Wei ^{[2
]}

Wei, Huadong ^{[1
]}

Xie, Chenjia ^{[1
]}

Wang, Zhongfeng ^{[1
]}

机构：

[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210023, Peoples R China

[2] Beijing Microelectronicstechnol Inst, Beijing 100076, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS | 2022年 / 69卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Deep convolution neural networks; discrete cosine transform; quantization; interlayer feature maps compression;

D O I：

10.1109/TCSI.2021.3120312

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Existing deep convolutional neural networks (CNNs) generate massive interlayer feature data during network inference. To maintain real-time processing in embedded systems, large on-chip memory is required to buffer the interlayer feature maps. In this paper, we propose an efficient hardware accelerator with an interlayer feature compression technique to significantly reduce the required on-chip memory size and off-chip memory access bandwidth. The accelerator compresses interlayer feature maps through transforming the stored data into frequency domain using hardware-implemented 8x 8 discrete cosine transform (DCT). The high-frequency components are removed after the DCT through quantization. Sparse matrix compression is utilized to further compress the interlayer feature maps. The on-chip memory allocation scheme is designed to support dynamic configuration of the feature map buffer size and scratch pad size according to different network-layer requirements. The hardware accelerator combines compression, decompression, and CNN acceleration into one computing stream, achieving minimal compressing and processing delay. A prototype accelerator is implemented on an FPGA platform and also synthesized in TSMC 28-nm COMS technology. It achieves 403GOPS peak throughput and 1.4x similar to 3.3x interlayer feature map reduction by adding light hardware area overhead, making it a promising hardware accelerator for intelligent IoT devices.

引用

页码：668 / 681

页数：14

共 50 条

[1] An Efficient CNN Inference Accelerator Based on Intra- and Inter-Channel Feature Map Compression
Xie, Chenjia
Shao, Zhuang
Zhao, Ning
Du, Yuan
Du, Li
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (09) : 3625 - 3638
[2] A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression
Lee, Hyunseung
Hong, Jihoon
Kim, Soosung
Lee, Seung Yul
Lee, Jae W.
[J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
[3] A Memory-Efficient CNN Accelerator Using Segmented Logarithmic Quantization and Multi-Cluster Architecture
Xu, Jiawei
Huan, Yuxiang
Huang, Boming
Chu, Haoming
Jin, Yi
Zheng, Li-Rong
Zou, Zhuo
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (06) : 2142 - 2146
[4] Transform-Based Feature Map Compression for CNN Inference
Shi, Yubo
Wang, Meiqi
Chen, Siyi
Wei, Jinghe
Wang, Zhongfeng
[J]. 2021 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2021,
[5] Facto-CNN: Memory-Efficient CNN Training with Low-rank Tensor Factorization and Lossy Tensor Compression
Lee, Seungtae
Ko, Jonghwan
Hong, Seokin
[J]. ASIAN CONFERENCE ON MACHINE LEARNING, VOL 222, 2023, 222
[6] Memory-efficient spatial prediction image compression scheme
Nandi, Anil V.
Patnaik, L. M.
Banakar, R. M.
[J]. IMAGE AND VISION COMPUTING, 2007, 25 (06) : 899 - 906
[7] Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Hosny, Abdelrahman
Neseem, Marina
Reda, Sherief
[J]. 2021 ACM/IEEE 6TH SYMPOSIUM ON EDGE COMPUTING (SEC 2021), 2021, : 14 - 25
[8] Adaptive Weight Compression for Memory-Efficient Neural Networks
Ko, Jong Hwan
Kim, Duckhwan
Na, Taesik
Kung, Jaeha
Mukhopadhyay, Saibal
[J]. PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 199 - 204
[9] CNN Inference Accelerators with Adjustable Feature Map Compression Ratios
Tsai, Yu-Chih
Liu, Chung-Yueh
Wang, Chia-Chun
Hsu, Tsen-Wei
Liu, Ren-Shuo
[J]. 2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD, 2023, : 631 - 634
[10] A memory-efficient block-wise MAP decoder architecture
Kim, S
Hwang, SY
Kang, MJ
[J]. ETRI JOURNAL, 2004, 26 (06) : 615 - 621

← 1 2 3 4 5 →