Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks

被引:3
|
作者
Shafique, Muhammad Ali [1 ]
Munir, Arslan [2 ]
Kong, Joonho [3 ]
机构
[1] Kansas State Univ, Dept Elect & Comp Engn, Manhattan, KS 66506 USA
[2] Kansas State Univ, Dept Comp Sci, Manhattan, KS 66506 USA
[3] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 41566, South Korea
关键词
optimization; deep learning; quantization; performance; TensorRT; automatic mixed precision;
D O I
10.3390/ai4040047
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs.
引用
收藏
页码:926 / 948
页数:23
相关论文
共 50 条
  • [31] Deep learning on NVIDIA GPUs for QSAR, QSPR and QNAR predictions
    Sattarov, Boris
    Mitrofanov, Artem
    Korotcov, Alexandru
    Tkachenko, Valery
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 254
  • [32] SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference
    Wang, Ziheng
    PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 31 - 42
  • [33] Object Detection on FPGAs and GPUs by Using Accelerated Deep Learning
    Cambay, V. Yusuf
    Ucar, Aysegul
    Arserim, M. Ali
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [34] Comparison Performance of Lymphocyte Classification for Various Datasets using Deep Learning
    Safuan, Syadia Nabilah Mohd
    Tomari, Mohd Razali Md
    Zakaria, Wan Nurshazwani Wan
    Suriani, Nor Surayahani
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 396 - 402
  • [35] Performance of Various Deep-Learning Networks in the Seed Classification Problem
    Eryigit, Recep
    Tugrul, Bulent
    SYMMETRY-BASEL, 2021, 13 (10):
  • [36] Performance analysis of various training algorithms of deep learning based controller
    Prasad, Bhawesh
    Kumar, Raj
    Singh, Manmohan
    ENGINEERING RESEARCH EXPRESS, 2023, 5 (02):
  • [37] Deep Reinforcement Learning-based Quantization for Federated Learning
    Zheng, Sihui
    Dong, Yuhan
    Chen, Xiang
    2023 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC, 2023,
  • [38] Performance Characterization and Optimization of Atomic Operations on AMD GPUs
    Elteir, Marwa
    Lin, Heshan
    Feng, Wu-chun
    2011 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2011, : 234 - 243
  • [39] On the Effect of Quantization on Deep Neural Networks Performance
    Tmamna, Jihene
    Fourati, Rahma
    Ltifi, Hela
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 144 - 156
  • [40] Large scale performance analysis of distributed deep learning frameworks for convolutional neural networks
    Marcel Aach
    Eray Inanc
    Rakesh Sarma
    Morris Riedel
    Andreas Lintermann
    Journal of Big Data, 10