Debugging Machine Learning Pipelines

被引:1
|
作者
Lourenco, Raoni [1 ]
Freire, Juliana [1 ]
Shasha, Dennis [1 ]
机构
[1] NYU, New York, NY 10003 USA
基金
美国国家科学基金会;
关键词
EXPLANATIONS;
D O I
10.1145/3329486.3329489
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning tasks entail the use of complex computational pipelines to reach quantitative and qualitative conclusions. If some of the activities in a pipeline produce erroneous or uninformative outputs, the pipeline may fail or produce incorrect results. Inferring the root cause of failures and unexpected behavior is challenging, usually requiring much human thought, and is both time consuming and error prone. We propose a new approach that makes use of iteration and provenance to automatically infer the root causes and derive succinct explanations of failures. Through a detailed experimental evaluation, we assess the cost, precision, and recall of our approach compared to the state of the art. Our source code and experimental data will be available for reproducibility and enhancement.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Data distribution debugging in machine learning pipelines
    Stefan Grafberger
    Paul Groth
    Julia Stoyanovich
    Sebastian Schelter
    [J]. The VLDB Journal, 2022, 31 : 1103 - 1126
  • [2] Data distribution debugging in machine learning pipelines
    Grafberger, Stefan
    Groth, Paul
    Stoyanovich, Julia
    Schelter, Sebastian
    [J]. VLDB JOURNAL, 2022, 31 (05): : 1103 - 1126
  • [3] On the Democratization of Machine Learning Pipelines
    Carqueja, Alexandre
    Cabral, Bruno
    Fernandes, Joao Paulo
    Lourenco, Nuno
    [J]. 2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 455 - 462
  • [4] Using machine learning to support debugging with Tarantula
    Briand, Lionel C.
    Labiche, Yvan
    Liu, Xuetao
    [J]. ISSRE 2007: 18TH IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 2007, : 137 - +
  • [5] Data pricing in machine learning pipelines
    Zicun Cong
    Xuan Luo
    Jian Pei
    Feida Zhu
    Yong Zhang
    [J]. Knowledge and Information Systems, 2022, 64 : 1417 - 1455
  • [6] Data pricing in machine learning pipelines
    Cong, Zicun
    Luo, Xuan
    Pei, Jian
    Zhu, Feida
    Zhang, Yong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (06) : 1417 - 1455
  • [7] Fairness in machine learning: definition, testing,debugging, and application
    Xuanqi GAO
    Chao SHEN
    Weipeng JIANG
    Chenhao LIN
    Qian LI
    Qian WANG
    Qi LI
    Xiaohong GUAN
    [J]. Science China(Information Sciences)., 2024, 67 (09) - 61
  • [8] Training Data Debugging for the Fairness of Machine Learning Software
    Li, Yanhui
    Meng, Linghan
    Chen, Lin
    Yu, Li
    Wu, Di
    Zhou, Yuming
    Xu, Baowen
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 2215 - 2227
  • [9] Explanatory and Actionable Debugging for Machine Learning: A TableQA Demonstration
    Cho, Minseok
    Lee, Gyeongbok
    Hwang, Seung-won
    [J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1333 - 1336
  • [10] Modeling and debugging engineering decision procedures with machine learning
    Reich, Y
    Medina, MA
    Shieh, TY
    Jacobs, TL
    [J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 1996, 10 (02) : 157 - 166