On Integrating the Data-Science and Machine-Learning Pipelines for Responsible AI

被引:0
|
作者
Esmaelizadeh, Armin [1 ]
Rorseth, Joel [2 ]
Yu, Andy [2 ]
Godfrey, Parke [3 ]
Golab, Lukasz [2 ]
Srivastava, Divesh [4 ]
Szlichta, Jaroslaw [3 ]
Taghva, Kazem [1 ]
机构
[1] UNLV, Las Vegas, NV 89154 USA
[2] Univ Waterloo, Waterloo, ON, Canada
[3] York Univ, N York, ON, Canada
[4] AT&T Chief Data Off, Atlanta, GA USA
关键词
Data Science; Machine Learning Model Diagnostics; Explainable AI;
D O I
10.1145/3665601.3669849
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Herein, we advocate for the integration of the pipelines for data science (e.g., extraction, cleaning, and exploration) and machine learning (e.g., training data collection, feature selection, model selection, and parameter tuning), toward responsible and trustworthy artificial intelligence. We argue that the metadata generated by the machine-learning pipeline, which includes model outputs and model accuracy scores, is best managed and analyzed using data-science tools, thereby obtaining actionable insights into model performance, interpretability, and bias. We illustrate via two examples from our recent work as proof of concept: data summarization for model performance diagnostics; and input and output exploration to understand retrieval-augmented language models.
引用
收藏
页码:50 / 53
页数:4
相关论文
共 50 条
  • [1] Machine-Learning Metacomputing for Materials Science Data
    Steuben, J.C.
    Geltmacher, A.B.
    Rodriguez, S.N.
    Birnbaum, A.J.
    Graber, B.D.
    Rawlings, A.K.
    Iliopoulos, A.P.
    Michopoulos, J.G.
    [J]. Journal of Computing and Information Science in Engineering, 2024, 24 (11)
  • [2] Data Science Meets Law Learning Responsible AI together
    Hod, Shlomi
    Chagal-Feferkorn, Karni
    Elkin-Koren, Niva
    Gal, Avigdor
    [J]. COMMUNICATIONS OF THE ACM, 2022, 65 (02) : 35 - 39
  • [3] Integrating Heuristic and Machine-Learning Methods for Efficient Virtual Machine Allocation in Data Centers
    Pahlevan, Ali
    Qu, Xiaoyu
    Zapater, Marina
    Atienza, David
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (08) : 1667 - 1680
  • [4] Address AI bottlenecks with machine-learning algorithms
    Williams, Mike
    [J]. Control Engineering, 2020, 67 (10)
  • [5] Integrating data from the web by machine-learning tree-pattern queries
    Habegger, Benjamin
    Debarbieux, Denis
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2006: COOPIS, DOA, GADA, AND ODBAS, PT 1, PROCEEDINGS, 2006, 4275 : 941 - 948
  • [6] Machine-learning AI enhances Permian basin drilling
    Procyk, Alex
    [J]. OIL & GAS JOURNAL, 2021, 119 (07) : 48 - 52
  • [7] Integrating Conversational AI and Machine Learning in Education
    Katake, Kanchan Jadhav
    Sugandhi, Rekha
    [J]. SMART TRENDS IN COMPUTING AND COMMUNICATIONS, VOL 4, SMARTCOM 2024, 2024, 948 : 327 - 338
  • [8] Data pricing in machine learning pipelines
    Zicun Cong
    Xuan Luo
    Jian Pei
    Feida Zhu
    Yong Zhang
    [J]. Knowledge and Information Systems, 2022, 64 : 1417 - 1455
  • [9] Data pricing in machine learning pipelines
    Cong, Zicun
    Luo, Xuan
    Pei, Jian
    Zhu, Feida
    Zhang, Yong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (06) : 1417 - 1455
  • [10] Machine-learning interatomic potentials for materials science
    Mishin, Y.
    [J]. ACTA MATERIALIA, 2021, 214