An Empirical Study of Challenges in Converting Deep Learning Models

被引:9
|
作者
Openja, Moses [1 ]
Nikanjam, Amin [1 ]
Yahmed, Ahmed Haj [1 ]
Khomh, Foutse [1 ]
Jiang, Zhen Ming [2 ]
机构
[1] Polytech Montreal, Montreal, PQ, Canada
[2] York Univ, Toronto, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Empirical; Deep Learning; Converting Trained Models; Deploying ML Models; Robustness;
D O I
10.1109/ICSME55016.2022.00010
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually, DL models are developed and trained using DL frameworks like TensorFlow and PyTorch. Each framework has its own internal mechanisms/formats to represent and train DL models (deep neural networks), and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where they were developed. To solve the interoperability issue and make DL models compatible with different frameworks/environments, some exchange formats are introduced for DL models, like ONNX and CoreML. However, ONNX and CoreML were never empirically evaluated by the community to reveal their prediction accuracy, performance, and robustness after conversion. Poor accuracy or non-robust behavior of converted models may lead to poor quality of deployed DL-based software systems. We conduct, in this paper, the first empirical study to assess ONNX and CoreML for converting trained DL models. In our systematic approach, two popular DL frameworks, Keras and PyTorch, are used to train five widely used DL models on three popular datasets. The trained models are then converted to ONNX and CoreML and transferred to two runtime environments designated for such formats, to be evaluated. We investigate the prediction accuracy before and after conversion. Our results unveil that the prediction accuracy of converted models are at the same level of originals. The performance (time cost and memory consumption) of converted models are studied as well. The size of models are reduced after conversion, which can result in optimized DL-based software deployment. We also study the adversarial robustness of converted models to make sure about the robustness of deployed DL-based software. Leveraging the state-of-the-art adversarial attack approaches, converted models are generally assessed robust at the same level of originals. However, obtained results show that CoreML models are more vulnerable to adversarial attacks compared to ONNX. The general message of our findings is that DL developers should be cautious on the deployment of converted models that may 1) perform poorly while switching from one framework to another, 2) have challenges in robust deployment, or 3) run slowly, leading to poor quality of deployed DL-based software, including DL-based software maintenance tasks, like bug prediction.
引用
收藏
页码:13 / 23
页数:11
相关论文
共 50 条
  • [1] An Empirical Study of Deep Learning Models for Vulnerability Detection
    Steenhoek, Benjamin
    Rahman, Md Mahbubur
    Jiles, Richard
    Le, Wei
    [J]. 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2237 - 2248
  • [2] An Empirical Study of Common Challenges in Developing Deep Learning Applications
    Zhang, Tianyi
    Gao, Cuiyun
    Ma, Lei
    Lyu, Michael R.
    Kim, Miryung
    [J]. 2019 IEEE 30TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2019, : 104 - 115
  • [3] How are Deep Learning Models Similar?: An Empirical Study on Clone Analysis of Deep Learning Software
    Wu, Xiongfei
    Qin, Liangyu
    Yu, Bing
    Xie, Xiaofei
    Ma, Lei
    Xue, Yinxing
    Liu, Yang
    Zhao, Jianjun
    [J]. 2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, : 172 - 183
  • [4] Common challenges of deep reinforcement learning applications development: an empirical study
    Morovati, Mohammad Mehdi
    Tambon, Florian
    Taraghi, Mina
    Nikanjam, Amin
    Khomh, Foutse
    [J]. EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (04)
  • [5] Challenges for the Repeatability of Deep Learning Models
    Alahmari, Saeed S.
    Goldgof, Dmitry B.
    Mouton, Peter R.
    Hall, Lawrence O.
    [J]. IEEE ACCESS, 2020, 8 : 211860 - 211868
  • [6] Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study
    Velez, Tatiana Castro
    Khatchadourian, Raffi
    Bagherzadeh, Mehdi
    Raja, Anita
    [J]. 2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 469 - 481
  • [7] Empirical study of privacy inference attack against deep reinforcement learning models
    Zhou, Huaicheng
    Mo, Kanghua
    Huang, Teng
    Li, Yongjin
    [J]. CONNECTION SCIENCE, 2023, 35 (01)
  • [8] Empirical evaluation of deep learning models for sentiment analysis
    Pathak, Ajeet Ram
    Pandey, Manjusha
    Rautaray, Siddharth
    [J]. JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2019, 22 (04) : 741 - 752
  • [9] Applicability of Deep Learning Models for Stock Price Forecasting An Empirical Study on BANKEX Data
    Balaji, A. Jayanth
    Ram, D. S. Harish
    Nair, Binoy B.
    [J]. 8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 947 - 953
  • [10] Image fairness in deep learning: problems, models, and challenges
    Huan Tian
    Tianqing Zhu
    Wei Liu
    Wanlei Zhou
    [J]. Neural Computing and Applications, 2022, 34 : 12875 - 12893