A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression

被引:0
|
作者
Lee, Hyunseung [1 ]
Hong, Jihoon [1 ]
Kim, Soosung [1 ]
Lee, Seung Yul [1 ]
Lee, Jae W. [1 ]
机构
[1] Seoul Natl Univ, Seoul 08826, South Korea
关键词
D O I
10.1109/DAC56929.2023.10248005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model compression is widely adopted for edge inference of neural networks (NNs) to minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model compression has demonstrated promising results to maximize compression ratio and minimize accuracy drop. However, XOR-based decompression alone produces bit errors and requires auxiliary data for error correction. To minimize model size and hence DRAM traffic, we propose an enhanced decompression algorithm and a low-cost hardware accelerator for it. Since not all errors are equal, our algorithm selects only important errors to correct with no accuracy drop. Compared with the baseline XOR compression scheme correcting all errors, the compressed model size of ResNet-18 and VGG-16 is reduced by 23% and 27% respectively. We also present a low-cost hardware implementation of on-line XOR decompression and error-correction logic built on Gemmini, an open-source systolic array accelerator, at the cost of only a 0.39% and 0.46% increase in area and power.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Memory-efficient spatial prediction image compression scheme
    Nandi, Anil V.
    Patnaik, L. M.
    Banakar, R. M.
    [J]. IMAGE AND VISION COMPUTING, 2007, 25 (06) : 899 - 906
  • [22] Memory-Efficient Training of Binarized Neural Networks on the Edge
    Yayla, Mikail
    Chen, Jian-Jia
    [J]. PROCEEDINGS OF THE 59TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC 2022, 2022, : 661 - 666
  • [23] Adaptive Weight Compression for Memory-Efficient Neural Networks
    Ko, Jong Hwan
    Kim, Duckhwan
    Na, Taesik
    Kung, Jaeha
    Mukhopadhyay, Saibal
    [J]. PROCEEDINGS OF THE 2017 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE), 2017, : 199 - 204
  • [24] A High Throughput Parallel Hash Table on FPGA using XOR-based Memory
    Zhang, Ruizhi
    Wijeratne, Sasindu
    Yang, Yang
    Kuppannagari, Sanmukh R.
    Prasanna, Viktor K.
    [J]. 2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
  • [25] TETRIS: Memory-efficient Serverless Inference through Tensor Sharing
    Li, Jie
    Zhao, Laiping
    Yang, Yanan
    Zhan, Kunlin
    Li, Keqiu
    [J]. PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 473 - 488
  • [26] Memory-Efficient Deep Learning Inference in Trusted Execution Environments
    Truong, Jean-Baptiste
    Gallagher, William
    Guo, Tian
    Walls, Robert J.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON CLOUD ENGINEERING, IC2E 2021, 2021, : 161 - 167
  • [27] Gemel: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge
    Padmanabhan, Arthi
    Agarwal, Neil
    Iyer, Anand
    Ananthanarayanan, Ganesh
    Shu, Yuanchao
    Karianakis, Nikolaos
    Xu, Guoqing Harry
    Netravali, Ravi
    [J]. PROCEEDINGS OF THE 20TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, NSDI 2023, 2023, : 973 - 994
  • [28] Efficient Deep Learning Inference based on Model Compression
    Zhang, Qing
    Zhang, Mengru
    Wang, Mengdi
    Sui, Wanchen
    Meng, Chen
    Yang, Jun
    Kong, Weidan
    Cui, Xiaoyuan
    Lin, Wei
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 1776 - 1783
  • [29] MEMORY-EFFICIENT LEARNED IMAGE COMPRESSION WITH PRUNED HYPERPRIOR MODULE
    Luo, Ao
    Sun, Heming
    Liu, Jinming
    Katto, Jiro
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3061 - 3065
  • [30] BOUNCE: Memory-Efficient SIMD Approach for Lightweight Integer Compression
    Hildebrandt, Juliana
    Habich, Dirk
    Lehner, Wolfgang
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2022), 2022, : 123 - 128