Curated Data In - Trustworthy In Silico Models Out: The Impact of Data Quality on the Reliability of Artificial Intelligence Models as Alternatives to Animal Testing

被引:25
|
作者
Alves, Vinicius M. [1 ]
Auerbach, Scott S. [2 ]
Kleinstreuer, Nicole [3 ]
Rooney, John P. [4 ]
Muratov, Eugene N. [5 ,6 ]
Rusyn, Ivan [7 ]
Tropsha, Alexander [5 ]
Schmitt, Charles [1 ]
机构
[1] NIEHS, Off Data Sci, Div Natl Toxicol Program DNTP, Durham, NC 27560 USA
[2] NIEHS, Toxinformat Grp, Predict Toxicol Branch, DNTP, Durham, NC 27560 USA
[3] NIEHS, Natl Toxicol Program Interagcy Ctr Evaluat Altern, Sci Directors Off, DNTP, Durham, NC 27560 USA
[4] Integrated Lab Syst LLC, Morrisville, NC USA
[5] Univ N Carolina, UNC Eshelman Sch Pharm, Lab Mol Modeling, Chapel Hill, NC 27599 USA
[6] Univ Fed Paraiba, Dept Pharmaceut Sci, Joao Pessoa, Paraiba, Brazil
[7] Texas A&M Univ, Coll Vet Med & Biomed Sci, Dept Vet Integrat Biosci, College Stn, TX USA
来源
ATLA-ALTERNATIVES TO LABORATORY ANIMALS | 2021年 / 49卷 / 03期
关键词
artificial intelligence; data curation; data quality; data reproducibility; QSAR; QSAR; PREDICTION; REPRODUCIBILITY; TOXICOLOGY; TOXICITY; STRATEGY; VERIFY; BEWARE; CHEMBL; TRUST;
D O I
10.1177/02611929211029635
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
New Approach Methodologies (NAMs) that employ artificial intelligence (AI) for predicting adverse effects of chemicals have generated optimistic expectations as alternatives to animal testing. However, the major underappreciated challenge in developing robust and predictive AI models is the impact of the quality of the input data on the model accuracy. Indeed, poor data reproducibility and quality have been frequently cited as factors contributing to the crisis in biomedical research, as well as similar shortcomings in the fields of toxicology and chemistry. In this article, we review the most recent efforts to improve confidence in the robustness of toxicological data and investigate the impact that data curation has on the confidence in model predictions. We also present two case studies demonstrating the effect of data curation on the performance of AI models for predicting skin sensitisation and skin irritation. We show that, whereas models generated with uncurated data had a 7-24% higher correct classification rate (CCR), the perceived performance was, in fact, inflated owing to the high number of duplicates in the training set. We assert that data curation is a critical step in building computational models, to help ensure that reliable predictions of chemical toxicity are achieved through use of the models.
引用
收藏
页码:73 / 82
页数:10
相关论文
共 50 条
  • [1] Causal Artificial Intelligence Models of Food Quality Data
    Kurtanjek, Zelimir
    FOOD TECHNOLOGY AND BIOTECHNOLOGY, 2024, 62 (01) : 102 - 109
  • [2] The middle-out approach: assessing models of legal governance in data protection, artificial intelligence, and the Web of Data
    Pagallo, Ugo
    Casanovas, Pompeu
    Madelin, Robert
    THEORY AND PRACTICE OF LEGISLATION, 2019, 7 (01): : 1 - 25
  • [3] Prediction and Inference: From Models and Data to Artificial Intelligence
    Gammaitoni, Luca
    Vulpiani, Angelo
    FOUNDATIONS OF PHYSICS, 2024, 54 (05)
  • [4] Impact of generative artificial intelligence models on the performance of citizen data scientists in retail firms
    Abumalloh, Rabab Ali
    Nilashi, Mehrbakhsh
    Ooi, Keng Boon
    Tan, Garry Wei Han
    Chan, Hing Kai
    COMPUTERS IN INDUSTRY, 2024, 161
  • [5] Real-world data to build explainable trustworthy artificial intelligence models for prediction of immunotherapy efficacy in NSCLC patients
    Prelaj, Arsela
    Galli, Edoardo Gregorio
    Miskovic, Vanja
    Pesenti, Mattia
    Viscardi, Giuseppe
    Pedica, Benedetta
    Mazzeo, Laura
    Bottiglieri, Achille
    Provenzano, Leonardo
    Spagnoletti, Andrea
    Marinacci, Roberto
    De Toma, Alessandro
    Proto, Claudia
    Ferrara, Roberto
    Brambilla, Marta
    Occhipinti, Mario
    Manglaviti, Sara
    Galli, Giulia
    Signorelli, Diego
    Giani, Claudia
    Beninato, Teresa
    Pircher, Chiara Carlotta
    Rametta, Alessandro
    Kosta, Sokol
    Zanitti, Michele
    Di Mauro, Maria Rosa
    Rinaldi, Arturo
    Di Gregorio, Settimio
    Antonia, Martinetti
    Garassino, Marina Chiara
    de Braud, Filippo G. M.
    Restelli, Marcello
    Lo Russo, Giuseppe
    Ganzinelli, Monica
    Trovo, Francesco
    Pedrocchi, Alessandra Laura Giulia
    FRONTIERS IN ONCOLOGY, 2023, 12
  • [6] Explainable Artificial Intelligence for Deep Synthetic Data Generation Models
    Valina, Luis
    Teixeira, Brigida
    Reis, Amalie
    Vale, Zita
    Pinto, Tiago
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 555 - 556
  • [7] The Impact of Data Quality on Neural Network Models
    Li, Chunmei
    Li, Zhao
    Jun, Xu
    Pi, Wei
    CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 657 - 665
  • [8] 2 VIEWS OF DATA SEMANTICS - SURVEY OF DATA MODELS IN ARTIFICIAL INTELLIGENCE AND DATABASE MANAGEMENT
    WONG, HKT
    MYLOPOULOS, J
    INFOR, 1977, 15 (03) : 344 - 383
  • [9] Massive data language models and conversational artificial intelligence: Emerging issues
    O'Leary, Daniel E.
    INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2022, 29 (03): : 182 - 198
  • [10] A systematic review of trustworthy and explainable artificial intelligence in healthcare: Assessment of quality, bias risk, and data fusion
    Albahri, A. S.
    Duhaim, Ali M.
    Fadhel, Mohammed A.
    Alnoor, Alhamzah
    Baqer, Noor S.
    Alzubaidi, Laith
    Albahri, O. S.
    Alamoodi, A. H.
    Bai, Jinshuai
    Salhi, Asma
    Santamaria, Jose
    Ouyang, Chun
    Gupta, Ashish
    Gu, Yuantong
    Deveci, Muhammet
    INFORMATION FUSION, 2023, 96 : 156 - 191