Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

被引:3
|
作者
Shafique, Muhammad Ali [1 ]
Munir, Arslan [2 ]
Kong, Joonho [3 ]
机构
[1] Kansas State Univ, Dept Elect & Comp Engn, Manhattan, KS 66506 USA
[2] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
[3] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
关键词
optimization; deep learning; quantization; performance; TensorRT; automatic mixed precision;
D O I
10.3390/ai4040047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.
引用
收藏
页码:926 / 948
页数:23
相关论文
共 50 条
  • [21] A Deep Learning Based Intrusion Detection System on GPUs
    Karatas, Gozde
    Demir, Onder
    Sahingoz, Ozgur Koray
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI-2019), 2019,
  • [22] Detailed Characterization of Deep Neural Networks on GPUs and FPGAs
    Karki, Aajna
    Keshava, Chethan Palangotu
    Shivakumar, Spoorthi Mysore
    Skow, Joshua
    Hegde, Goutam Madhukeshwar
    Jeon, Hyeran
    12TH WORKSHOP ON GENERAL PURPOSE PROCESSING USING GPUS (GPGPU 12), 2019, : 12 - 21
  • [23] Performance Characterization of Mobile GP-GPUs
    Andargie, Fitsum Assamnew
    Rose, Jonathan
    PROCEEDINGS OF THE 2015 12TH IEEE AFRICON INTERNATIONAL CONFERENCE - GREEN INNOVATION FOR AFRICAN RENAISSANCE (AFRICON), 2015,
  • [24] Performance of Training Sparse Deep Neural Networks on GPUs
    Wang, Jianzong
    Huang, Zhangcheng
    Kong, Lingwei
    Xiao, Jing
    Wang, Pengyu
    Zhang, Lu
    Li, Chao
    2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
  • [25] Performance Analysis of Distributed Deep Learning Frameworks in a Multi-GPU Environment
    Kavarakuntla, Tulasi
    Han, Liangxiu
    Lloyd, Huw
    Latham, Annabel
    Akintoye, Samson B.
    20TH INT CONF ON UBIQUITOUS COMP AND COMMUNICAT (IUCC) / 20TH INT CONF ON COMP AND INFORMATION TECHNOLOGY (CIT) / 4TH INT CONF ON DATA SCIENCE AND COMPUTATIONAL INTELLIGENCE (DSCI) / 11TH INT CONF ON SMART COMPUTING, NETWORKING, AND SERV (SMARTCNS), 2021, : 406 - 413
  • [26] Object Storage for Deep Learning Frameworks
    Ozeri, Or
    Ofer, Effi
    Kat, Ronen
    DIDL'18: PROCEEDINGS OF THE SECOND WORKSHOP ON DISTRIBUTED INFRASTRUCTURES FOR DEEP LEARNING, 2018, : 21 - 24
  • [27] A Survey of Deep-learning Frameworks
    Parvat, Aniruddha
    Chavan, Jai
    Kadam, Siddhesh
    Dev, Souradeep
    Pathak, Vidhi
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON INVENTIVE SYSTEMS AND CONTROL (ICISC 2017), 2017, : 211 - 217
  • [28] Survey on Testing of Deep Learning Frameworks
    Ma, Xiang-Yue
    Du, Xiao-Ting
    Cai, Qing
    Zheng, Yang
    Hu, Zheng
    Zheng, Zheng
    Ruan Jian Xue Bao/Journal of Software, 2024, 35 (08): : 3752 - 3784
  • [29] Scalable Deep Learning-Based Microarchitecture Simulation on GPUs
    Pandey, Santosh
    Li, Lingda
    Flynn, Thomas
    Hoisie, Adolfy
    Liu, Hang
    SC22: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2022,
  • [30] Fast Training of Deep Learning Models over Multiple GPUs
    Yi, Xiaodong
    Luo, Ziyue
    Meng, Chen
    Wang, Mengdi
    Long, Guoping
    Wu, Chuan
    Yang, Jun
    Lin, Wei
    PROCEEDINGS OF THE 2020 21ST INTERNATIONAL MIDDLEWARE CONFERENCE (MIDDLEWARE '20), 2020, : 105 - 118