Tool Support for Improving Software Quality in Machine Learning Programs

被引:0
|
作者
Cheng, Kwok Sun [1 ]
Huang, Pei-Chi [1 ]
Ahn, Tae-Hyuk [2 ]
Song, Myoungkyu [1 ]
机构
[1] Univ Nebraska Omaha, Dept Comp Sci, Omaha, NE 68182 USA
[2] St Louis Univ, Dept Comp Sci, St Louis, MO 63103 USA
关键词
software quality; anomaly detection; quality validation; machine learning applications; ARTIFICIAL-INTELLIGENCE AI; CANCER;
D O I
10.3390/info14010053
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Machine learning (ML) techniques discover knowledge from large amounts of data. Modeling in ML is becoming essential to software systems in practice. The accuracy and efficiency of ML models have been focused on ML research communities, while there is less attention on validating the qualities of ML models. Validating ML applications is a challenging and time-consuming process for developers since prediction accuracy heavily relies on generated models. ML applications are written by relatively more data-driven programming based on the black box of ML frameworks. All of the datasets and the ML application need to be individually investigated. Thus, the ML validation tasks take a lot of time and effort. To address this limitation, we present a novel quality validation technique that increases the reliability for ML models and applications, called MLVal. Our approach helps developers inspect the training data and the generated features for the ML model. A data validation technique is important and beneficial to software quality since the quality of the input data affects speed and accuracy for training and inference. Inspired by software debugging/validation for reproducing the potential reported bugs, MLVal takes as input an ML application and its training datasets to build the ML models, helping ML application developers easily reproduce and understand anomalies in the ML application. We have implemented an Eclipse plugin for MLVal that allows developers to validate the prediction behavior of their ML applications, the ML model, and the training data on the Eclipse IDE. In our evaluation, we used 23,500 documents in the bioengineering research domain. We assessed the ability of the MLVal validation technique to effectively help ML application developers: (1) investigate the connection between the produced features and the labels in the training model, and (2) detect errors early to secure the quality of models from better data. Our approach reduces the cost of engineering efforts to validate problems, improving data-centric workflows of the ML application development.
引用
下载
收藏
页数:20
相关论文
共 50 条
  • [31] ASMC: Improving Measurement Data Quality with Machine Learning
    Tan, Jun Hao
    Ho, Heng Wah
    2024 35TH ANNUAL SEMI ADVANCED SEMICONDUCTOR MANUFACTURING CONFERENCE, ASMC, 2024,
  • [32] Social software as support in hybrid learning environments: The value of the blog as a tool for reflective learning and peer support
    Hall, Hazel
    Davison, Brian
    LIBRARY & INFORMATION SCIENCE RESEARCH, 2007, 29 (02) : 163 - 187
  • [33] How Machine Learning is Improving US Navy Customer Support
    Powell, Michael
    Rotz, Jamison A.
    O'Malley, Kevin D.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13188 - 13195
  • [34] A Software Development Tool for Improving Quality of Service in Distributed Database Systems
    Hababeh, Ismail Omar
    2009 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY, 2009, : 126 - 130
  • [35] Analysis of user-feedback as a tool for improving software quality.
    Abookire, SA
    Martin, MT
    Teich, JM
    Kuperman, GJ
    Bates, DW
    JOURNAL OF GENERAL INTERNAL MEDICINE, 2000, 15 : 97 - 97
  • [36] Quality System for Production Software as Tool for Monitoring and Improving Organization KPIs
    Kifor, Vasile Claudiu
    Tudor, Nicolae
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2013, 8 (02) : 235 - 246
  • [37] Support Vector Machine: A Machine Learning Approach for Power Quality Application
    Shinde, Pravin
    Patil, Pavan
    Ahmad, Akbar
    Munje, Ravindra
    2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [38] Machine learning as a clinical decision support tool for patients with acromegaly
    Sulu, Cem
    Bektas, Ayyuce Begum
    Sahin, Serdar
    Durcan, Emre
    Kara, Zehra
    Demir, Ahmet Numan
    Ozkaya, Hande Mefkure
    Tanriover, Necmettin
    Comunoglu, Nil
    Kizilkilic, Osman
    Gazioglu, Nurperi
    Gonen, Mehmet
    Kadioglu, Pinar
    PITUITARY, 2022, 25 (03) : 486 - 495
  • [39] Machine learning as a clinical decision support tool for patients with acromegaly
    Cem Sulu
    Ayyüce Begüm Bektaş
    Serdar Şahin
    Emre Durcan
    Zehra Kara
    Ahmet Numan Demir
    Hande Mefkure Özkaya
    Necmettin Tanrıöver
    Nil Çomunoğlu
    Osman Kızılkılıç
    Nurperi Gazioğlu
    Mehmet Gönen
    Pınar Kadıoğlu
    Pituitary, 2022, 25 : 486 - 495
  • [40] An Emulator Software Tool for Improving Learning of DC-DC Converters
    Ferreiro, Alfonso Lago
    Simon, Ana Rey-Alvite
    Casas, Sergio Lamas
    IEEE REVISTA IBEROAMERICANA DE TECNOLOGIAS DEL APRENDIZAJE-IEEE RITA, 2020, 15 (02): : 63 - 69