Efficient Deep Learning Inference based on Model Compression

被引：10

作者：

Zhang, Qing ^{[1
]}

Zhang, Mengru ^{[1
]}

Wang, Mengdi ^{[1
]}

Sui, Wanchen ^{[1
]}

Meng, Chen ^{[1
]}

Yang, Jun ^{[1
]}

Kong, Weidan ^{[1
]}

Cui, Xiaoyuan ^{[1
]}

Lin, Wei ^{[1
]}

机构：

[1] Alibaba Grp, Hangzhou, Zhejiang, Peoples R China

来源：

PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW) | 2018年

关键词：

D O I：

10.1109/CVPRW.2018.00221

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep neural networks (DNNs) have evolved remarkably over the last decade and achieved great success in many machine learning tasks. Along the evolution of deep learning (DL) methods, computational complexity and resource consumption of DL models continue to increase, this makes efficient deployment challenging, especially in devices with low memory resources or in applications with strict latency requirements. In this paper, we will introduce a DL inference optimization pipeline, which consists of a series of model compression methods, including Tensor Decomposition (TD), Graph Adaptive Pruning (GAP), Intrinsic Sparse Structures (ISS) in Long Short-Term Memory (LSTM), Knowledge Distillation (KD) and low-bit model quantization. We use different modeling scenarios to test our inference optimization pipeline with above mentioned methods, and it shows promising results to make inference more efficient with marginal loss of model accuracy.

引用

页码：1776 / 1783

页数：8

共 50 条

[21] An efficient hybrid weather prediction model based on deep learning
Utku, A.
Can, U.
INTERNATIONAL JOURNAL OF ENVIRONMENTAL SCIENCE AND TECHNOLOGY, 2023, 20 (10) : 11107 - 11120
[22] An efficient hybrid weather prediction model based on deep learning
A. Utku
U. Can
International Journal of Environmental Science and Technology, 2023, 20 : 11107 - 11120
[23] An Efficient Indoor Localization Based on Deep Attention Learning Model
Abozeid A.
Taloba A.I.
Abd El-Aziz R.M.
Alwaghid A.F.
Salem M.
Elhadad A.
Computer Systems Science and Engineering, 2023, 46 (02): : 2637 - 2650
[24] Energy-efficient deep learning inference on edge devices
Daghero, Francesco
Pagliari, Daniele Jahier
Poncino, Massimo
HARDWARE ACCELERATOR SYSTEMS FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, 2021, 122 : 247 - 301
[25] EuclidNets: An Alternative Operation for Efficient Inference of Deep Learning Models
Li X.
Parazeres M.
Oberman A.
Ghaffari A.
Asgharian M.
Nia V.P.
SN Computer Science, 4 (5)
[26] FPGA Logic Block Architectures for Efficient Deep Learning Inference
Eldafrawy, Mohamed
Boutros, Andrew
Yazdanshenas, Sadegh
Betz, Vaughn
ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2020, 13 (03)
[27] Survey of Deep Learning Model Compression and Acceleration
Gao H.
Tian Y.-L.
Xu F.-Y.
Zhong S.
Ruan Jian Xue Bao/Journal of Software, 2021, 32 (01): : 68 - 92
[28] Combining deep learning model compression techniques
Santos Silva, Jose Vitor
Matos Matos, Leonardo
Santos, Flavio
Magalhaes Cerqueira, Helisson Oliveira
Macedo, Hendrik
Piedade Prado, Bruno Otavio
Ferreira da Silva, Gilton Jose
Bispo, Kalil Araujo
IEEE LATIN AMERICA TRANSACTIONS, 2022, 20 (03) : 458 - 464
[29] A Novel Deep Learning Model Compression Algorithm
Zhao, Ming
Li, Meng
Peng, Sheng-Lung
Li, Jie
ELECTRONICS, 2022, 11 (07)
[30] Model Compression for Communication Efficient Federated Learning
Shah, Suhail Mohmad
Lau, Vincent K. N.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5937 - 5951

← 1 2 3 4 5 →