Performance Evaluation of Deep Learning Compilers for Edge Inference

被引:21
|
作者
Verma, Gaurav [1 ]
Gupta, Yashi [1 ]
Malik, Abid M. [2 ]
Chapman, Barbara [1 ,2 ]
机构
[1] SUNY Stony Brook, Stony Brook, NY 11794 USA
[2] Brookhaven Natl Lab, Upton, NY 11973 USA
关键词
TensorFlow-TensorRT; TensorFlow Lite; Compilers for DL; Inference at Edge; NEURAL-NETWORK INFERENCE;
D O I
10.1109/IPDPSW52791.2021.00128
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, edge computing has received considerable attention as a promising means to provide Deep Learning (DL) based services. However, due to the limited computation capability of the data processing units (such as CPUs, GPUs, and specialized accelerators) in edge devices, using the devices' limited resources efficiently is a challenge that affects deep learning-based analysis services. This has led to the development of several inference compilers such as TensorRT, TensorFlow Lite, Relay, and TVM, which optimize DL inference models specifically for edge devices. These compilers operate on the standard DL models available for inferencing in various frameworks, e.g., PyTorch, TensorFlow, Caffe, PaddlePaddle, and transform them into a corresponding lightweight model. TensorFlow Lite and TensorRT are considered state-of-the-art inference compilers and encompass most of the compiler optimization techniques that have been proposed for edge computing. This paper presents a detailed performance study of TensorFlow Lite (TFLite) and TensorFlow TensorRT (TF-TRT) using commonly employed DL models for edge devices on varying hardware platforms. The work compares throughput, latency performance, and power consumption. We find that the integrated TF-TRT consistently performs better at the high precision floating point on different DL architectures, especially with GPUs using tensor cores. However, it loses its edge for model compression to TFLite at low precision. TFLite which is primarily designed for mobile applications, performs better with lightweight DL models than the deep neural network-based models. It is the first detailed performance comparison of TF-TRT and TFLite inference compilers to the best of our knowledge.
引用
收藏
页码:858 / 865
页数:8
相关论文
共 50 条
  • [21] Deep learning inference time guarantee for aggregated images at the edge of the network
    Genda, Kouichi
    [J]. IEICE COMMUNICATIONS EXPRESS, 2023, 12 (08): : 432 - 437
  • [22] Deep learning inference time guarantee in near future edge computing
    Genda, Kouichi
    [J]. 2023 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2023, : 220 - 225
  • [23] Digital In-Memory Computing to Accelerate Deep Learning Inference on the Edge
    Perri, Stefania
    Zambelli, Cristian
    Ielmini, Daniele
    Silvano, Cristina
    [J]. 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 130 - 133
  • [24] Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices
    Hu, Chenghao
    Li, Baochun
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 330 - 339
  • [25] Deep Reinforcement Learning for Containerized Edge Intelligence Inference Request Processing in IoT Edge Computing
    Nkenyereye, Lionel
    Baeg, Kang-Jun
    Chung, Wan-Young
    [J]. IEEE TRANSACTIONS ON SERVICES COMPUTING, 2023, 16 (06) : 4328 - 4344
  • [26] An Empirical Study on Common Bugs in Deep Learning Compilers
    Du, Xiaoting
    Zheng, Zheng
    Ma, Lei
    Zhao, Jianjun
    [J]. 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 184 - 195
  • [27] Adaptive Posit: Parameter aware numerical format for deep learning inference on the edge
    Langroudi, Hamed F.
    Karia, Vedant
    Gustafson, John L.
    Kudithipudi, Dhireesha
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 3123 - 3131
  • [28] Verifiable Deep Learning Inference on Heterogeneous Edge Devices With Trusted Execution Environment
    Liao, Longlong
    Zheng, Yuqiang
    Lu, Hong
    Liu, Xinqi
    Chen, Shuguang
    Yu, Yuanlong
    [J]. IEEE SENSORS JOURNAL, 2024, 24 (17) : 28351 - 28362
  • [29] RLink: Accelerate On-Device Deep Reinforcement Learning with Inference Knowledge at the Edge
    Zeng, Tianyu
    Zhang, Xiaoxi
    Feng, Daipeng
    Duan, Jingpu
    Zhou, Zhi
    Chen, Xu
    [J]. 2023 19TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN 2023, 2023, : 628 - 635
  • [30] EOP: Efficient Operator Partition for Deep Learning Inference over Edge Servers
    Xu, Yuanjia
    Wu, Heng
    Zhang, Wenbo
    Hu, Yi
    [J]. PROCEEDINGS OF THE 18TH ACM SIGPLAN/SIGOPS INTERNATIONAL CONFERENCE ON VIRTUAL EXECUTION ENVIRONMENTS, VEE 2022, 2022, : 45 - 57