Debugging Support for Machine Learning Applications in Bioengineering Text Corpora

被引:1
|
作者
Cheng, Kwok Sun [1 ]
Ahn, Tae-Hyuk [2 ]
Song, Myoungkyu [1 ]
机构
[1] Univ Nebraska, Omaha, NE 68182 USA
[2] St Louis Univ, St Louis, MO 63103 USA
关键词
D O I
10.1109/COMPSAC54236.2022.00166
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Modeling in machine learning (ML) is becoming an essential part of software systems in practice. Validating ML applications is a challenging and time-consuming process for developers since the accuracy of prediction heavily relies on generated models. ML applications are written by relatively more data-driven programming based on the blackbox of ML frameworks. If all of the datasets and the ML application need to be individually investigated, the ML debugging tasks would take a lot of time and effort. To address this limitation, we present a novel debugging technique for machine learning applications, called MLDBUG that helps ML application developers inspect the training data and the generated features for the ML model. Inspired by software debugging for reproducing the potential reported bugs, MLDBUG takes as input an ML application and its training datasets to build the ML models, helping ML application developers easily reproduce and understand anomalies on the ML application. We have implemented an Eclipse plugin for MLDBUG which allows developers to validate the prediction behavior of their ML applications, the ML model, and the training data on the Eclipse IDE. In our evaluation, we used 23,500 documents in the bioengineering research domain. We assessed the MLDBUG's capability of how effectively our debugging technique can help ML application developers investigate the connection between the produced features and the labels in the training model and the relationship between the training instances and the instances the model predicts.
引用
收藏
页码:1062 / 1069
页数:8
相关论文
共 50 条
  • [1] Learning To Rank Relevant Documents for Information Retrieval in Bioengineering Text Corpora
    Cheng, Kowk Sun
    Song, Myoungkyu
    [J]. 2021 IEEE 45TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2021), 2021, : 1565 - 1572
  • [2] TopExplorer: Tool Support for Extracting and Visualizing Topic Models in Bioengineering Text Corpora
    Cheng, Kwok Sun
    Wang, Zhipeng
    Huang, Pei-Chi
    Chundi, Parvathi
    Song, Myoungkyu
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2020, : 334 - 343
  • [3] Using machine learning to support debugging with Tarantula
    Briand, Lionel C.
    Labiche, Yvan
    Liu, Xuetao
    [J]. ISSRE 2007: 18TH IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 2007, : 137 - +
  • [4] Support vector machine active learning with applications to text classification
    Tong, S
    Koller, D
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (01) : 45 - 66
  • [5] Using machine learning to disentangle homonyms in large text corpora
    Roll, Uri
    Correia, Ricardo A.
    Berger-Tal, Oded
    [J]. CONSERVATION BIOLOGY, 2018, 32 (03) : 716 - 724
  • [6] Understanding Large Text Corpora via Sparse Machine Learning
    El Ghaoui, Laurent
    Vu Pham
    Li, Guan-Cheng
    Viet-An Duong
    Srivastava, Ashok
    Bhaduri, Kanishka
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2013, 6 (03) : 221 - 242
  • [7] DocTable: Table-Oriented Interactive Machine Learning for Text Corpora
    Yarlagadda, Sriram
    Scroggins, David J.
    Cao, Fang
    Devabhaktuni, Yeshwanth
    Buitron, Franklin
    Brown, Eli T.
    [J]. 2021 IEEE WORKSHOP ON MACHINE LEARNING FROM USER INTERACTIONS (MLUI 2021), 2021, : 1 - 11
  • [8] Debugging Machine Learning Pipelines
    Lourenco, Raoni
    Freire, Juliana
    Shasha, Dennis
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL WORKSHOP ON DATA MANAGEMENT FOR END-TO-END MACHINE LEARNING, DEEM 2019, 2019,
  • [9] Comparison of extreme learning machine with support vector machine for text classification
    Liu, Y
    Loh, HT
    Tor, SB
    [J]. INNOVATIONS IN APPLIED ARTIFICIAL INTELLIGENCE, 2005, 3533 : 390 - 399
  • [10] MACHINE-READABLE TEXT CORPORA IN FRENCH
    STEIN, A
    [J]. ZEITSCHRIFT FUR FRANZOSISCHE SPRACHE UND LITERATUR, 1995, 105 (01): : 1 - 25