Performance Evaluation of Deep Learning Compilers for Edge Inference

被引：21

作者：

Verma, Gaurav ^{[1
]}

Gupta, Yashi ^{[1
]}

Malik, Abid M. ^{[2
]}

Chapman, Barbara ^{[1
,2
]}

机构：

[1] SUNY Stony Brook, Stony Brook, NY 11794 USA

[2] Brookhaven Natl Lab, Upton, NY 11973 USA

来源：

2021 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW) | 2021年

关键词：

TensorFlow-TensorRT; TensorFlow Lite; Compilers for DL; Inference at Edge; NEURAL-NETWORK INFERENCE;

D O I：

10.1109/IPDPSW52791.2021.00128

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recently, edge computing has received considerable attention as a promising means to provide Deep Learning (DL) based services. However, due to the limited computation capability of the data processing units (such as CPUs, GPUs, and specialized accelerators) in edge devices, using the devices' limited resources efficiently is a challenge that affects deep learning-based analysis services. This has led to the development of several inference compilers such as TensorRT, TensorFlow Lite, Relay, and TVM, which optimize DL inference models specifically for edge devices. These compilers operate on the standard DL models available for inferencing in various frameworks, e.g., PyTorch, TensorFlow, Caffe, PaddlePaddle, and transform them into a corresponding lightweight model. TensorFlow Lite and TensorRT are considered state-of-the-art inference compilers and encompass most of the compiler optimization techniques that have been proposed for edge computing. This paper presents a detailed performance study of TensorFlow Lite (TFLite) and TensorFlow TensorRT (TF-TRT) using commonly employed DL models for edge devices on varying hardware platforms. The work compares throughput, latency performance, and power consumption. We find that the integrated TF-TRT consistently performs better at the high precision floating point on different DL architectures, especially with GPUs using tensor cores. However, it loses its edge for model compression to TFLite at low precision. TFLite which is primarily designed for mobile applications, performs better with lightweight DL models than the deep neural network-based models. It is the first detailed performance comparison of TF-TRT and TFLite inference compilers to the best of our knowledge.

引用

页码：858 / 865

页数：8

共 50 条

[21] Deep learning inference time guarantee for aggregated images at the edge of the network
Genda, Kouichi
[J]. IEICE COMMUNICATIONS EXPRESS, 2023, 12 (08): : 432 - 437
[22] Deep learning inference time guarantee in near future edge computing
Genda, Kouichi
[J]. 2023 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2023, : 220 - 225
[23] Digital In-Memory Computing to Accelerate Deep Learning Inference on the Edge
Perri, Stefania
Zambelli, Cristian
Ielmini, Daniele
Silvano, Cristina
[J]. 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 130 - 133
[24] Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices
Hu, Chenghao
Li, Baochun
[J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 330 - 339
[25] Deep Reinforcement Learning for Containerized Edge Intelligence Inference Request Processing in IoT Edge Computing
Nkenyereye, Lionel
Baeg, Kang-Jun
Chung, Wan-Young
[J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4328 - 4344
[26] An Empirical Study on Common Bugs in Deep Learning Compilers
Du, Xiaoting
Zheng, Zheng
Ma, Lei
Zhao, Jianjun
[J]. 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 184 - 195
[27] Adaptive Posit: Parameter aware numerical format for deep learning inference on the edge
Langroudi, Hamed F.
Karia, Vedant
Gustafson, John L.
Kudithipudi, Dhireesha
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3123 - 3131
[28] Verifiable Deep Learning Inference on Heterogeneous Edge Devices With Trusted Execution Environment
Liao, Longlong
Zheng, Yuqiang
Lu, Hong
Liu, Xinqi
Chen, Shuguang
Yu, Yuanlong
[J]. IEEE SENSORS JOURNAL, 2024, 24 (17) : 28351 - 28362
[29] RLink: Accelerate On-Device Deep Reinforcement Learning with Inference Knowledge at the Edge
Zeng, Tianyu
Zhang, Xiaoxi
Feng, Daipeng
Duan, Jingpu
Zhou, Zhi
Chen, Xu
[J]. 2023 19TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN 2023, 2023, : 628 - 635
[30] EOP: Efficient Operator Partition for Deep Learning Inference over Edge Servers
Xu, Yuanjia
Wu, Heng
Zhang, Wenbo
Hu, Yi
[J]. PROCEEDINGS OF THE 18TH ACM SIGPLAN/SIGOPS INTERNATIONAL CONFERENCE ON VIRTUAL EXECUTION ENVIRONMENTS, VEE 2022, 2022, : 45 - 57

← 1 2 3 4 5 →