Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

被引:3
|
作者
Shafique, Muhammad Ali [1 ]
Munir, Arslan [2 ]
Kong, Joonho [3 ]
机构
[1] Kansas State Univ, Dept Elect & Comp Engn, Manhattan, KS 66506 USA
[2] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
[3] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
关键词
optimization; deep learning; quantization; performance; TensorRT; automatic mixed precision;
D O I
10.3390/ai4040047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.
引用
收藏
页码:926 / 948
页数:23
相关论文
共 50 条
  • [1] Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
    Shi, Shaohuai
    Wang, Qiang
    Chu, Xiaowen
    2018 16TH IEEE INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP, 16TH IEEE INT CONF ON PERVAS INTELLIGENCE AND COMP, 4TH IEEE INT CONF ON BIG DATA INTELLIGENCE AND COMP, 3RD IEEE CYBER SCI AND TECHNOL CONGRESS (DASC/PICOM/DATACOM/CYBERSCITECH), 2018, : 949 - 957
  • [2] Quantization Backdoors to Deep Learning Commercial Frameworks
    Ma, Hua
    Qiu, Huming
    Gao, Yansong
    Zhang, Zhi
    Abuadbba, Alsharif
    Xue, Minhui
    Fu, Anmin
    Zhang, Jiliang
    Al-Sarawi, Said F.
    Abbott, Derek
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (03) : 1155 - 1172
  • [3] Performance Characteristics of Virtualized GPUs for Deep Learning
    Michael, Scott
    Teige, Scott
    Li, Junjie
    Lowe, John Michael
    Turner, George
    Henschel, Robert
    PROCEEDINGS OF 2020 3RD IEEE/ACM INTERNATIONAL WORKSHOP ON INTEROPERABILITY OF SUPERCOMPUTING AND CLOUD TECHNOLOGIES (SUPERCOMPCLOUD 2020), 2020, : 14 - 20
  • [4] Deep learning with GPUs
    Jeon, Won
    Ko, Gun
    Lee, Jiwon
    Lee, Hyunwuk
    Ha, Dongho
    Ro, Won Woo
    HARDWARE ACCELERATOR SYSTEMS FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING, 2021, 122 : 167 - 215
  • [5] Various Frameworks and Libraries of Machine Learning and Deep Learning: A Survey
    Zhaobin Wang
    Ke Liu
    Jian Li
    Ying Zhu
    Yaonan Zhang
    Archives of Computational Methods in Engineering, 2024, 31 : 1 - 24
  • [6] Various Frameworks and Libraries of Machine Learning and Deep Learning: A Survey
    Wang, Zhaobin
    Liu, Ke
    Li, Jian
    Zhu, Ying
    Zhang, Yaonan
    ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2024, 31 (01) : 1 - 24
  • [7] Performance Analysis of CNN Frameworks for GPUs
    Kim, Heehoon
    Nam, Hyoungwook
    Jung, Wookeun
    Lee, Jaejin
    2017 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS), 2017, : 55 - 64
  • [8] Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
    Lin, Zhongyi
    Feng, Louis
    Ardestani, Ehsan K.
    Lee, Jaewon
    Lundell, John
    Kim, Changkyu
    Kejariwal, Arun
    Owens, John D.
    2022 IEEE 29TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS, HIPC, 2022, : 48 - 58
  • [9] Building a Performance Model for Deep Learning Recommendation Model Training on GPUs
    Lin, Zhongyi
    Feng, Louis
    Ardestani, Ehsan K.
    Lee, Jaewon
    Lundell, John
    Kim, Changkyu
    Kejariwal, Arun
    Owens, John D.
    2022 IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE (ISPASS 2022), 2022, : 227 - 229
  • [10] Performance engineering for HEVC transform and quantization kernel on GPUs
    Cobrnic, Mate
    Duspara, Alen
    Dragic, Leon
    Piljic, Igor
    Kovac, Mario
    AUTOMATIKA, 2020, 61 (03) : 325 - 333