Tool Support for Improving Software Quality in Machine Learning Programs

被引：0

作者：

Cheng, Kwok Sun ^{[1
]}

Huang, Pei-Chi ^{[1
]}

Ahn, Tae-Hyuk ^{[2
]}

Song, Myoungkyu ^{[1
]}

机构：

[1] Univ Nebraska Omaha, Dept Comp Sci, Omaha, NE 68182 USA

[2] St Louis Univ, Dept Comp Sci, St Louis, MO 63103 USA

来源：

INFORMATION | 2023年 / 14卷 / 01期

关键词：

software quality; anomaly detection; quality validation; machine learning applications; ARTIFICIAL-INTELLIGENCE AI; CANCER;

D O I：

10.3390/info14010053

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Machine learning (ML) techniques discover knowledge from large amounts of data. Modeling in ML is becoming essential to software systems in practice. The accuracy and efficiency of ML models have been focused on ML research communities, while there is less attention on validating the qualities of ML models. Validating ML applications is a challenging and time-consuming process for developers since prediction accuracy heavily relies on generated models. ML applications are written by relatively more data-driven programming based on the black box of ML frameworks. All of the datasets and the ML application need to be individually investigated. Thus, the ML validation tasks take a lot of time and effort. To address this limitation, we present a novel quality validation technique that increases the reliability for ML models and applications, called MLVal. Our approach helps developers inspect the training data and the generated features for the ML model. A data validation technique is important and beneficial to software quality since the quality of the input data affects speed and accuracy for training and inference. Inspired by software debugging/validation for reproducing the potential reported bugs, MLVal takes as input an ML application and its training datasets to build the ML models, helping ML application developers easily reproduce and understand anomalies in the ML application. We have implemented an Eclipse plugin for MLVal that allows developers to validate the prediction behavior of their ML applications, the ML model, and the training data on the Eclipse IDE. In our evaluation, we used 23,500 documents in the bioengineering research domain. We assessed the ability of the MLVal validation technique to effectively help ML application developers: (1) investigate the connection between the produced features and the labels in the training model, and (2) detect errors early to secure the quality of models from better data. Our approach reduces the cost of engineering efforts to validate problems, improving data-centric workflows of the ML application development.

引用

下载

页数：20

共 50 条

[31] ASMC: Improving Measurement Data Quality with Machine Learning
Tan, Jun Hao
Ho, Heng Wah
2024 35TH ANNUAL SEMI ADVANCED SEMICONDUCTOR MANUFACTURING CONFERENCE, ASMC, 2024,
[32] Social software as support in hybrid learning environments: The value of the blog as a tool for reflective learning and peer support
Hall, Hazel
Davison, Brian
LIBRARY & INFORMATION SCIENCE RESEARCH, 2007, 29 (02) : 163 - 187
[33] How Machine Learning is Improving US Navy Customer Support
Powell, Michael
Rotz, Jamison A.
O'Malley, Kevin D.
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13188 - 13195
[34] A Software Development Tool for Improving Quality of Service in Distributed Database Systems
Hababeh, Ismail Omar
2009 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY, 2009, : 126 - 130
[35] Analysis of user-feedback as a tool for improving software quality.
Abookire, SA
Martin, MT
Teich, JM
Kuperman, GJ
Bates, DW
JOURNAL OF GENERAL INTERNAL MEDICINE, 2000, 15 : 97 - 97
[36] Quality System for Production Software as Tool for Monitoring and Improving Organization KPIs
Kifor, Vasile Claudiu
Tudor, Nicolae
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2013, 8 (02) : 235 - 246
[37] Support Vector Machine: A Machine Learning Approach for Power Quality Application
Shinde, Pravin
Patil, Pavan
Ahmad, Akbar
Munje, Ravindra
2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
[38] Machine learning as a clinical decision support tool for patients with acromegaly
Sulu, Cem
Bektas, Ayyuce Begum
Sahin, Serdar
Durcan, Emre
Kara, Zehra
Demir, Ahmet Numan
Ozkaya, Hande Mefkure
Tanriover, Necmettin
Comunoglu, Nil
Kizilkilic, Osman
Gazioglu, Nurperi
Gonen, Mehmet
Kadioglu, Pinar
PITUITARY, 2022, 25 (03) : 486 - 495
[39] Machine learning as a clinical decision support tool for patients with acromegaly
Cem Sulu
Ayyüce Begüm Bektaş
Serdar Şahin
Emre Durcan
Zehra Kara
Ahmet Numan Demir
Hande Mefkure Özkaya
Necmettin Tanrıöver
Nil Çomunoğlu
Osman Kızılkılıç
Nurperi Gazioğlu
Mehmet Gönen
Pınar Kadıoğlu
Pituitary, 2022, 25 : 486 - 495
[40] An Emulator Software Tool for Improving Learning of DC-DC Converters
Ferreiro, Alfonso Lago
Simon, Ana Rey-Alvite
Casas, Sergio Lamas
IEEE REVISTA IBEROAMERICANA DE TECNOLOGIAS DEL APRENDIZAJE-IEEE RITA, 2020, 15 (02): : 63 - 69

← 1 2 3 4 5 →