A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression

被引：0

作者：

Lee, Hyunseung ^{[1
]}

Hong, Jihoon ^{[1
]}

Kim, Soosung ^{[1
]}

Lee, Seung Yul ^{[1
]}

Lee, Jae W. ^{[1
]}

机构：

[1] Seoul Natl Univ, Seoul 08826, South Korea

来源：

2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC | 2023年

关键词：

D O I：

10.1109/DAC56929.2023.10248005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Model compression is widely adopted for edge inference of neural networks (NNs) to minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model compression has demonstrated promising results to maximize compression ratio and minimize accuracy drop. However, XOR-based decompression alone produces bit errors and requires auxiliary data for error correction. To minimize model size and hence DRAM traffic, we propose an enhanced decompression algorithm and a low-cost hardware accelerator for it. Since not all errors are equal, our algorithm selects only important errors to correct with no accuracy drop. Compared with the baseline XOR compression scheme correcting all errors, the compressed model size of ResNet-18 and VGG-16 is reduced by 23% and 27% respectively. We also present a low-cost hardware implementation of on-line XOR decompression and error-correction logic built on Gemmini, an open-source systolic array accelerator, at the cost of only a 0.39% and 0.46% increase in area and power.

引用

页数：6

共 50 条

[21] Memory-efficient spatial prediction image compression scheme
Nandi, Anil V.
Patnaik, L. M.
Banakar, R. M.
[J]. IMAGE AND VISION COMPUTING, 2007, 25 (06) : 899 - 906
[22] Memory-Efficient Training of Binarized Neural Networks on the Edge
Yayla, Mikail
Chen, Jian-Jia
[J]. PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 661 - 666
[23] Adaptive Weight Compression for Memory-Efficient Neural Networks
Ko, Jong Hwan
Kim, Duckhwan
Na, Taesik
Kung, Jaeha
Mukhopadhyay, Saibal
[J]. PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 199 - 204
[24] A High Throughput Parallel Hash Table on FPGA using XOR-based Memory
Zhang, Ruizhi
Wijeratne, Sasindu
Yang, Yang
Kuppannagari, Sanmukh R.
Prasanna, Viktor K.
[J]. 2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[25] TETRIS: Memory-efficient Serverless Inference through Tensor Sharing
Li, Jie
Zhao, Laiping
Yang, Yanan
Zhan, Kunlin
Li, Keqiu
[J]. PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 473 - 488
[26] Memory-Efficient Deep Learning Inference in Trusted Execution Environments
Truong, Jean-Baptiste
Gallagher, William
Guo, Tian
Walls, Robert J.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E 2021, 2021, : 161 - 167
[27] Gemel: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge
Padmanabhan, Arthi
Agarwal, Neil
Iyer, Anand
Ananthanarayanan, Ganesh
Shu, Yuanchao
Karianakis, Nikolaos
Xu, Guoqing Harry
Netravali, Ravi
[J]. PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 973 - 994
[28] Efficient Deep Learning Inference based on Model Compression
Zhang, Qing
Zhang, Mengru
Wang, Mengdi
Sui, Wanchen
Meng, Chen
Yang, Jun
Kong, Weidan
Cui, Xiaoyuan
Lin, Wei
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1776 - 1783
[29] MEMORY-EFFICIENT LEARNED IMAGE COMPRESSION WITH PRUNED HYPERPRIOR MODULE
Luo, Ao
Sun, Heming
Liu, Jinming
Katto, Jiro
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3061 - 3065
[30] BOUNCE: Memory-Efficient SIMD Approach for Lightweight Integer Compression
Hildebrandt, Juliana
Habich, Dirk
Lehner, Wolfgang
[J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2022), 2022, : 123 - 128

← 1 2 3 4 5 →