An Empirical Study of Challenges in Converting Deep Learning Models

被引：9

作者：

Openja, Moses ^{[1
]}

Nikanjam, Amin ^{[1
]}

Yahmed, Ahmed Haj ^{[1
]}

Khomh, Foutse ^{[1
]}

Jiang, Zhen Ming ^{[2
]}

机构：

[1] Polytech Montreal, Montreal, PQ, Canada

[2] York Univ, Toronto, ON, Canada

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME 2022) | 2022年

基金：

加拿大自然科学与工程研究理事会;

关键词：

Empirical; Deep Learning; Converting Trained Models; Deploying ML Models; Robustness;

D O I：

10.1109/ICSME55016.2022.00010

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually, DL models are developed and trained using DL frameworks like TensorFlow and PyTorch. Each framework has its own internal mechanisms/formats to represent and train DL models (deep neural networks), and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where they were developed. To solve the interoperability issue and make DL models compatible with different frameworks/environments, some exchange formats are introduced for DL models, like ONNX and CoreML. However, ONNX and CoreML were never empirically evaluated by the community to reveal their prediction accuracy, performance, and robustness after conversion. Poor accuracy or non-robust behavior of converted models may lead to poor quality of deployed DL-based software systems. We conduct, in this paper, the first empirical study to assess ONNX and CoreML for converting trained DL models. In our systematic approach, two popular DL frameworks, Keras and PyTorch, are used to train five widely used DL models on three popular datasets. The trained models are then converted to ONNX and CoreML and transferred to two runtime environments designated for such formats, to be evaluated. We investigate the prediction accuracy before and after conversion. Our results unveil that the prediction accuracy of converted models are at the same level of originals. The performance (time cost and memory consumption) of converted models are studied as well. The size of models are reduced after conversion, which can result in optimized DL-based software deployment. We also study the adversarial robustness of converted models to make sure about the robustness of deployed DL-based software. Leveraging the state-of-the-art adversarial attack approaches, converted models are generally assessed robust at the same level of originals. However, obtained results show that CoreML models are more vulnerable to adversarial attacks compared to ONNX. The general message of our findings is that DL developers should be cautious on the deployment of converted models that may 1) perform poorly while switching from one framework to another, 2) have challenges in robust deployment, or 3) run slowly, leading to poor quality of deployed DL-based software, including DL-based software maintenance tasks, like bug prediction.

引用

页码：13 / 23

页数：11

共 50 条

[1] An Empirical Study of Deep Learning Models for Vulnerability Detection
Steenhoek, Benjamin
Rahman, Md Mahbubur
Jiles, Richard
Le, Wei
[J]. 2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 2237 - 2248
[2] An Empirical Study of Common Challenges in Developing Deep Learning Applications
Zhang, Tianyi
Gao, Cuiyun
Ma, Lei
Lyu, Michael R.
Kim, Miryung
[J]. 2019 IEEE 30TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2019, : 104 - 115
[3] How are Deep Learning Models Similar?: An Empirical Study on Clone Analysis of Deep Learning Software
Wu, Xiongfei
Qin, Liangyu
Yu, Bing
Xie, Xiaofei
Ma, Lei
Xue, Yinxing
Liu, Yang
Zhao, Jianjun
[J]. 2020 IEEE/ACM 28TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION, ICPC, 2020, : 172 - 183
[4] Common challenges of deep reinforcement learning applications development: an empirical study
Morovati, Mohammad Mehdi
Tambon, Florian
Taraghi, Mina
Nikanjam, Amin
Khomh, Foutse
[J]. EMPIRICAL SOFTWARE ENGINEERING, 2024, 29 (04)
[5] Challenges for the Repeatability of Deep Learning Models
Alahmari, Saeed S.
Goldgof, Dmitry B.
Mouton, Peter R.
Hall, Lawrence O.
[J]. IEEE ACCESS, 2020, 8 : 211860 - 211868
[6] Challenges in Migrating Imperative Deep Learning Programs to Graph Execution: An Empirical Study
Velez, Tatiana Castro
Khatchadourian, Raffi
Bagherzadeh, Mehdi
Raja, Anita
[J]. 2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 469 - 481
[7] Empirical study of privacy inference attack against deep reinforcement learning models
Zhou, Huaicheng
Mo, Kanghua
Huang, Teng
Li, Yongjin
[J]. CONNECTION SCIENCE, 2023, 35 (01)
[8] Empirical evaluation of deep learning models for sentiment analysis
Pathak, Ajeet Ram
Pandey, Manjusha
Rautaray, Siddharth
[J]. JOURNAL OF STATISTICS AND MANAGEMENT SYSTEMS, 2019, 22 (04) : 741 - 752
[9] Applicability of Deep Learning Models for Stock Price Forecasting An Empirical Study on BANKEX Data
Balaji, A. Jayanth
Ram, D. S. Harish
Nair, Binoy B.
[J]. 8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 947 - 953
[10] Image fairness in deep learning: problems, models, and challenges
Huan Tian
Tianqing Zhu
Wei Liu
Wanlei Zhou
[J]. Neural Computing and Applications, 2022, 34 : 12875 - 12893

← 1 2 3 4 5 →